Opened 4 years ago

Closed 3 years ago

#86 closed defect (fixed)

`makedirs` fails if an intermediate directory doesn't show contents

Reported by: schwa Owned by: schwa
Priority: major Milestone: 3.3
Component: Library Version: 3.2
Keywords: mkdir, makedirs, virtual directories, iis Cc:

Description (last modified by schwa)

Reported by Roger Demetrescu:

I've find an issue with ftputil 3.2 (well, it happens with 2.8 and 3.0 too) when trying to use host.makedirs() withing a FTP server that doesnt show any directories/files under de root ('/')

My customer gave me this directory to work:

/aaa/bbb/ccc

When I try to create this directory: ./ddd/eee/ FTPHost.makedirs() try do create those directories:

/aaa
/aaa/bbb
/aaa/bbb/ccc
/aaa/bbb/ccc/ddd
/aaa/bbb/ccc/ddd/eee

Of course, /aaa already exists, and the ftputil does this check to see if it should reraise the exception:

    if not self.path.isdir(next_directory):

The problem is: I am able to do a host.chdir('/aaa/bbb/ccc') but if I do a host.chdir('/') and try to list the files and directories, it returns an empty list.

Using FileZilla to connect to this FTP server gave me the same result. I am able to see all files and directories under /aaa/bbb/ccc but I see nothing under the root: /

FTP is a Microsoft FTP Service (don't know any more details).

So, any suggestion on how to make ftputil stop trying to create these directories:

/aaa
/aaa/bbb
/aaa/bbb/ccc

and only try to create:

/aaa/bbb/ccc/ddd
/aaa/bbb/ccc/ddd/eee

since my current directory is already /aaa/bbb/ccc ?

Attachments (1)

seemingly_empty_directories.patch (2.1 KB) - added by schwa 4 years ago.
Patch of Roger's suggestion against the current host.py implementation

Download all attachments as: .zip

Change History (13)

comment:1 Changed 4 years ago by schwa

Added by Roger:

Ok... I did some more tests and found out that I am not able to see any files/directories inside:

/aaa
/aaa/bbb

I had to change FTPHost's makedirs implementation to this code:

    # Ignore unused argument `mode`
    # pylint: disable=unused-argument
    def makedirs(self, path, mode=None):
        """
        Make the directory `path`, but also make not yet existing
        intermediate directories, like `os.makedirs`. The value
        of `mode` is only accepted for compatibility with
        `os.makedirs` but otherwise ignored.
        """
        path = ftputil.tool.as_unicode(path)
        path = self.path.abspath(path)
        directories = path.split(self.sep)
        old_dir = self.getcwd()

        try:
            # Try to build the directory chain from the "uppermost" to
            # the "lowermost" directory.
            for index in range(1, len(directories)):
                # Re-insert the separator which got lost by using `path.split`.
                next_directory = self.sep + self.path.join(*directories[:index+1])
                try:
                    self.chdir(next_directory)
                except ftputil.error.PermanentError:
                    self.mkdir(next_directory)
        finally:
            self.chdir(old_dir)

It works like a charm... :)

Last edited 4 years ago by schwa (previous) (diff)

comment:2 Changed 4 years ago by schwa

I'm glad you could make it work - and saved me some brooding over the problem. ;-)

When seeing your solution I was thinking about whether it was a complete substitute (and hence could go into ftputil) or if it was failing for some other cases the current makedirs implementation covers. Particularly, could there be reasons for chdir to fail, apart from the non-existence of the directory? What about other problems?

I can think of these issues:

  • The directory next_directory exists, but you can't change to it because you don't have execute permission (or some equivalent on the server).

In this case mkdir would try to create it and fail and thus raise an exception (as in the old implementation). If the directory existed and was the last in the chain of directories to create, makedirs should have succeeded though. So this is a case that the old implementation covers but not the new one.

If the directory is not the last in the chain, chdir would fail and mkdir as well and raise a PermanentError. The same would have happened with the old implementation and is in my opinion the expected behavior.

  • The "directory" next_directory exists, but is a file.

Here, chdir would fail and then the mkdir as well, which would be the expected behavior.

  • You aren't allowed to change to /aaa though you might be allowed to change to /aaa/bbb.

On Unix, you can't change to /aaa/bbb if you can't change to /aaa if I'm not mistaken. I don't know if the same is true for FTP servers. It might be but due to some anomaly it might not be. If this anomaly applies, the old implementation might work but not the new one.

  • What cases might I have forgotten? ...

Can you think of a way to counter the above problems with the new approach but still keeping the advantages of the old and new one? Ok, the last bullet item seems obscure to me, but the problem from the first one not so much. But even if we were able to solve the problem from the first item, it would mean we make makedirs work for one obscure case, but fail for another obscure case. ;-/

comment:3 Changed 4 years ago by schwa

  • Description modified (diff)
  • Status changed from new to assigned

comment:4 Changed 4 years ago by schwa

Roger suggested this implementation which is a mix of the original implementation and his.

    # Ignore unused argument `mode`
    # pylint: disable=unused-argument
    def makedirs(self, path, mode=None):
        """
        Make the directory `path`, but also make not yet existing
        intermediate directories, like `os.makedirs`. The value
        of `mode` is only accepted for compatibility with
        `os.makedirs` but otherwise ignored.
        """
        path = ftputil.tool.as_unicode(path)
        path = self.path.abspath(path)
        directories = path.split(self.sep)
        old_dir = self.getcwd()
        try:
            # Try to build the directory chain from the "uppermost" to
            # the "lowermost" directory.
            for index in range(1, len(directories)):
                # Re-insert the separator which got lost by using `path.split`.
                next_directory = self.sep + self.path.join(*directories[:index+1])
                try:
                    self.chdir(next_directory)
                except ftputil.error.PermanentError:
                    try:
                        self.mkdir(next_directory)
                    except ftputil.error.PermanentError:
                        # Find out the cause of the error. Re-raise the
                        # exception only if the directory didn't exist already,
                        # else something went _really_ wrong, e. g. there's a
                        # regular file with the name of the directory.
                        if not self.path.isdir(next_directory):
                            raise
        finally:
            self.chdir(old_dir)

(I changed the last except to a finally.)

comment:5 Changed 4 years ago by schwa

I had been thinking about this when I answered your other mail, but this solution again has the isdir call, so I thought you would run into the same problem as before.

Now looking at this code, it seems that it would cover all cases the old code did. Also it passes all the unit tests I currently have. :-)

I think it would fail if there was a combination of the special case where you can't change into the last directory and the server pretends the directory is empty.

Still, I can't bring myself to include this in the regular ftputil because it's a workaround for a seemingly very rare case - and it has a complex nested structure of try - for - try - try. The complexity issue could maybe be remedied by extracting a part of the method into a new one, but the problem still seems too special to me to warrant the more complex implementation.

Admittedly, there's also (even more) complexity in ftputil because of problems with whitespace-containing directory names, but this was or is a problem with several servers, so the additional complexity seemed warranted.

For now, I'd recommend you override the original makedirs with your implementation in a derived class and use it instead of the native FTPHost. If it fits your problem better, you may even want to monkey-patch the original class.

Changed 4 years ago by schwa

Patch of Roger's suggestion against the current host.py implementation

comment:6 Changed 4 years ago by schwa

I added the above implementation suggestion as a patch in case someone else wants to apply it to the original code. That said, I rather recommend overriding the method in a derived class.

comment:7 Changed 4 years ago by schwa

As Roger pointed out, the concept is known as "virtual directories" and it seems to be mostly used by Microsoft. So this doesn't seem to be so rare and "anomal" than I thought. The effect of virtual directories is that directory listings don't reflect all the actual contents.

If a server uses virtual directories, this will affect not only makedirs, but directly and indirectly all ftputil commands that directory rely on listings, like stat, path.getmtime, path.isdir, to name a few.

Possibly (some) problems due to virtual directories could be handled with a generic helper like FTPHost._robust_ftp_command instead of adding special workarounds to almost every API.

As another approach, I wonder if it's feasible to let ftputil "remember" paths it has ever seen and "overlay" them with the visible directory contents. For example, if the user logs into the server and finds himself/herself in /aaa/bbb, although aaa isn't visible in /, ftputil could still store that / has an aaa directory and /aaa has a bbb directory. However, there's the problem that if the contents of / can't be retrieved, stat data for aaa will be missing. The same may apply to other directory levels.

Last edited 4 years ago by schwa (previous) (diff)

comment:8 Changed 4 years ago by schwa

  • Keywords virtual directories iis added

comment:9 Changed 4 years ago by schwa

I had some other ideas. First, for reference, the current implementation is:

    def makedirs(self, path, mode=None):
        """
        Make the directory `path`, but also make not yet existing
        intermediate directories, like `os.makedirs`. The value
        of `mode` is only accepted for compatibility with
        `os.makedirs` but otherwise ignored.
        """
        path = ftputil.tool.as_unicode(path)
        path = self.path.abspath(path)
        directories = path.split(self.sep)
        # Try to build the directory chain from the "uppermost" to
        # the "lowermost" directory.
        for index in range(1, len(directories)):
            # Re-insert the separator which got lost by using `path.split`.
            next_directory = self.sep + self.path.join(*directories[:index+1])
            try:
                self.mkdir(next_directory)
            except ftputil.error.PermanentError:
                # Find out the cause of the error. Re-raise the
                # exception only if the directory didn't exist already,
                # else something went _really_ wrong, e. g. there's a
                # regular file with the name of the directory.
                if not self.path.isdir(next_directory):
                    raise

Variant 1

Before iterating over the directories, get the current directory. Assume that this directory and all its parent directories exist. :-) In the following loop, ignore all directories where the path is contained in the previously determined current directory.

Pros:

  • In my opinion easier to understand than the revised algorithm above.
  • Should work as long as you changed into your virtual directory before calling makedirs.
  • Will save some server requests for the path components that already exist according to the current directory.
  • Should be easy to understand as long as the "ignore all directories where the path is contained in the previously determined current directory" part can be expressed understandably in an extracted method.

Cons:

  • Won't solve the original problem if the client did not change into "its" virtual directory before.
  • This might make it (even more) difficult to understand what's going on if you're not in the virtual directory and the command fails. For a moment I thought about storing past current directories, but some of them could be deleted. To keep track of this, we'd need to add logic to update the "previously seen current directories" cache, which in my opinion isn't worth it.

Variant 2

Check if the full directory to make already exists. If yes, we're done (but see con below).

Split the full potential directory path into components as in the current implementation. Then start (trying) to create directories with the last directory (right to left), until this succeeds. (If we run into the root and still can't create a directory, raise an exception.) Then use mkdir to add the previously "missing" directory components from left to right.

Pros:

  • In my opinion easier to understand than the revised algorithm above if we can drop the nested trys.
  • Doesn't depend on the current directory like variant 1.

Cons:

  • Might require more server requests because the algorithm tries to create a sequence of potentially non-existing directories. On the other hand, the current implementation has a similar weakness in that it might try to create directories that already exist.
  • The initial check for the presence of the directory is critical. If the directory "above" the last directory is a virtual directory, a test with path.isdir won't see the last directory and return False. If the last directory doesn't have "change-into" permission, a test with chdir will fail. This con is in common with the original implementation suggestion. That said, if we start at the end, it's less likely that we run into the invisible-directory problem.
  • It might be difficult to analyze the algorithm since mkdir has an additional failure mode apart from "no permission" and "already exists": "parent directory doesn't exist".

Overall discussion

If we can assume that the current directory of the FTP client is inside the virtual directory, I like variant 1 the best.

Roger's suggestion would be very likely ok for me if the loop body is extracted into its own method. A slight annoyance is that by using chdir, the extracted method will modify some implicit instance state but not restore it.

Variant 2 would be fine if there wasn't still the limitation with recognizing the existence of a directory. This variant also has an additional failure mode. When I initially thought about this algorithm, it seemed quite elegant, but at this point I hadn't thought about the case that the full directory might already exist.

comment:10 Changed 4 years ago by schwa

  • Milestone set to 3.3
  • Version set to 3.2

comment:11 follow-up: Changed 4 years ago by schwa

I plan to use Roger's suggestion from comment 4. However, I first want to create a unit test first that involves a simulated virtual directory and thus fails for the old implementation.

comment:12 in reply to: ↑ 11 Changed 3 years ago by schwa

  • Resolution set to fixed
  • Status changed from assigned to closed

Replying to schwa:

I plan to use Roger's suggestion from comment 4. However, I first want to create a unit test first that involves a simulated virtual directory and thus fails for the old implementation.

I applied a variant of Roger's patch in [b250e8ed87a6]. The only difference between the changeset and Roger's patch is some reformatting.

I didn't implement unit tests yet because this would probably very complicated with the current mock code in test.mock_ftplib. I plan to add tests for this ticket when I migrate the tests to use the mock library (see also ticket #98).

Note: See TracTickets for help on using tickets.