#!/usr/bin/env python3.2
# -*- coding: utf-8 -*-
"""
this bugreport is basically a python file that may be executed in order to
reproduce the problem described.
the suggested workaround in <a href="/~sschwarzer/ftputil/52" title="~sschwarzer/ftputil#52: Main FTPHost session times out while downloading a large file">#52</a> does not work for 421 and proftpd, as the
[documentation of TimeoutNoTransfer] suggests: by issuing a pwd command to the
server, this timeout is not reseted. with a local installation of proftpd,
using the commandline program 'ftp' and issuing pwd commands to reset the
timeout i saw the error code 421. listing the current directory helps to
prevent the timeout.
this demo file was verified with
Linux debian 3.2.0-4-amd64 <a href="/~sschwarzer/ftputil/1" title="~sschwarzer/ftputil#1: Add russian documentation to wiki (Anton Stepanov)">#1</a> SMP Debian 3.2.57-3+deb7u1 x86_64 GNU/Linux
python3.2
ftputil3.0
ProFTPD Version 1.3.4a
the default value of TimeoutNoTransfer is 300 seconds. in order to reproduce
the issue without waiting five minutes, i changed it to 3 seconds. in debian
you have to change line 23 of /etc/proftpd/proftpd.conf to set this value.
silent_fail was discovered while trying to reproduce the issue and is not
really relevant to my original usecase.
upload_exception in contrast reflects my usecase very well. the production
server i wanted to upload files to has a TimeoutNoTransfer setting of 600, so
after ten minutes of uploading a exception was raised.
i think the time to debug this issue would have been sorter if the
'Implementation notes' of class FTPHost would be written to the documentation.
after reading this it makes absolutely sense to open two ftp connections while
there is only one FTPHost object visible to the user. without this information
(thinkign FTPHost will only open one connection to the server) the error code
421 during a running download is quite crazy.
a workaround is included in the function upload_exception, but not activated.
[documentation of TimeoutNoTransfer]
http://www.proftpd.org/docs/directives/linked/config_ref_TimeoutNoTransfer.html
"""
import time
import os
from ftputil import FTPHost
def silent_fail(host):
"""
given the proftp setting ``TimeoutNoTransfer 3`` and ``time_to_wait = 2``
only a part of the tree is outputted but no exception is raised. changing
``time_to_wait`` to 0 all files of all subdirectory are printed
output for ``time_to_wait = 2``
/home/login/ .bash_history
/home/login/ .bash_logout
/home/login/ .bashrc
/home/login/ .profile
(no exception or any error message)
output for ``time_to_wait = 0``
/home/login/ .bash_history
/home/login/ .bash_logout
/home/login/ .bashrc
/home/login/ .profile
/home/login/testfolder asd
/home/login/testfolder random
"""
time_to_wait = 0
for root, dirs, files in host.walk("/home/login/"):
#print(root, dirs, files)
for filename in files:
print(root, filename)
with host.open(os.path.join(root, filename), 'rb') as fileobject:
host.stat(os.path.join(root, filename))
time.sleep(time_to_wait)
fileobject.read()
host.close()
def upload_exception(host):
"""
proftp server with setting ``TimeoutNoTransfer 3``
output of this function:
going to create folder 0
created folder 0
opening file 0 0
closed file 0 0
going to keep alive
keeped alive
opening file 0 1
closed file 0 1
going to keep alive
keeped alive
opening file 0 2
closed file 0 2
going to keep alive
Traceback (most recent call last):
File "/home/bs/Code/ftp-dab/v/lib/python3.2/site-packages/ftputil-3.0-py3.2.egg/ftputil/host.py", line 110, in keep_alive
self._session.pwd()
File "/usr/lib/python3.2/ftplib.py", line 584, in pwd
resp = self.voidcmd('PWD')
File "/usr/lib/python3.2/ftplib.py", line 260, in voidcmd
return self.voidresp()
File "/usr/lib/python3.2/ftplib.py", line 234, in voidresp
resp = self.getresp()
File "/usr/lib/python3.2/ftplib.py", line 227, in getresp
raise error_temp(resp)
ftplib.error_temp: 421 No transfer timeout (3 seconds): closing control connection
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "ftplibtest.py", line 107, in <module>
upload_exception(host)
File "ftplibtest.py", line 100, in upload_exception
host.keep_alive()
File "/home/bs/Code/ftp-dab/v/lib/python3.2/site-packages/ftputil-3.0-py3.2.egg/ftputil/host.py", line 110, in keep_alive
self._session.pwd()
File "/home/bs/Code/ftp-dab/v/lib/python3.2/site-packages/ftputil-3.0-py3.2.egg/ftputil/error.py", line 124, in __exit__
raise TemporaryError(*exc_value.args)
ftputil.error.TemporaryError: 421 No transfer timeout (3 seconds): closing control connection
Debugging info: ftputil 3.0, Python 3.2.3 (linux2)
it's intresting to note, that if you remove the keep_alive function the
exception is raised later (when trying to create the folder)
"""
u = "upload"
# first remove the folder from the last upload:
if host.path.exists(u):
host.rmtree(u)
host.mkdir(u)
# now let's create some folders and add some data
for i in range(10):
print("going to create folder", i)
host.mkdir(os.path.join(u, str(i)))
print("created folder", i)
for k in range(10):
print("opening file", i, k)
with host.open(os.path.join(u, str(i), str(k)), "w") as fo:
fo.write("".join((chr(i) for i in range(255))))
time.sleep(1)
print("closed file", i, k)
print("going to keep alive")
host.keep_alive()
print("keeped alive")
#print("trying to fix it")
#host.listdir("/")
#print("this fixes the issue")
if __name__ == '__main__':
host = FTPHost('localhost', 'login', 'password')
#silent_fail(host)
upload_exception(host)
Thanks for the report!
I wasn't aware that there are FTP servers that make a distinction between an idle timeout and a data transfer timeout. For me, it always was just a "timeout", an idle timeout implied. Hence, the
keep_alive
method assumes that retrieving the current directory will prevent the "main" connection from timing out if the method is used in time before the timeout happens.As you say, a workaround might be to get a directory listing. I'm quite hesitant to add this to
keep_alive
by default because for large directories this might take a relatively long time. When reading your report, I thought for a moment about getting a directory listing when the user passes an additional flag tokeep_alive
, but this doesn't feel right to me. With this,keep_alive
becomes too low-level and server-specific in my opinion. Keeping all connections of anFTPHost
instance alive has always been a challenge; actually despite a lot of experimenting I didn't find a way that works in all cases. In hindsight I wonder if addingkeep_alive
was a good idea after all since the feature is bound to cause subtle problems, at least in special cases. To sum this up, I think you should use your workaround to keep the connection alive. You may consider inheriting fromFTPHost
and overwritekeep_alive
to get the directory listing if you always want to do this.About moving the "implementation notes" to the documentation proper: When I wrote these implementation notes, I really considered this an implementation detail that shouldn't be mentioned in the documentation for users of ftputil. On the other hand, I can't completely hide that ftputil uses additional FTP connections behind the scenes (see leaky abstraction). So I probably should tell a bit more about the subject than the mere hints in the documentation on
keep_alive
. (I did something similar on caching. Although this is something that ideally should be hidden inside ftputil, this caching might have confusing side-effects, so I decided to say a bit about this.)I plan to do a documentation change sometime in June and release a new ftputil version, together with fixes for some previous tickets.
Please let me know what you think.
i will implement the workaround in my application, so my personal problem is solved. but it would be nice to find a general solution for the problem.
first thought
i had the exactly same thought about the parameter for
keep_alive
. perhaps it could be a solution to keep the function, but remove the implementation (as it is dependent of the ftp-server configuration), so the user would have to provide a ftp-command/callback that keeps his connection alive.a completely different approach
i'm not familiar enough with the implementation details of ftputil, but
upload_exception
could technically work with only one ftp connection?perhaps it would be possible to add a parameter for a maximum connection count. this way the user would be aware that there are multiple ftp connections in the background and i could set the value to one. if the code really needs to many connections, ftputil would raise an exception, suggesting to raise the maximum connection count.
i wonder if ftputil works fine with ftp server that limit the number of concurrent connections?
a third idea
why not simple reopen the connection? if it fails to reopen the connection we have a real problem (= exception), otherwise we have a short delay.
keep_alive
could be removed.ps: the leaky abstraction was a interesting read, thanks!
Thanks for your input. :-)
Replying to ftputiluser:
i will implement the workaround in my application, so my personal problem is solved. but it would be nice to find a general solution for the problem.
Good. :-)
first thought
i had the exactly same thought about the parameter for
keep_alive
. perhaps it could be a solution to keep the function, but remove the implementation (as it is dependent of the ftp-server configuration), so the user would have to provide a ftp-command/callback that keeps his connection alive.I can imagine two ways to interpret "remove the implementation": Do you mean that the implementation should be
pass
or should the method raise aNotImplementedError
? I'm not comfortable with the first approach because it might cause existing ftputil client code to (subtly) fail that used to work. Granted, the current implementation isn't guaranteed to work, but I believe it works in most cases. I wouldn't want to raise an exception either because that would obviously change the API and break client code.That said, changing APIs isn't taboo, but there were already some significant changes from ftputil 2.8 to 3.0, and I wouldn't want to change more now unless to fix a "real" bug or an otherwise really awkward problem.
a completely different approach
i'm not familiar enough with the implementation details of ftputil, but
upload_exception
could technically work with only one ftp connection?I'm not sure what you mean by "work with only one FTP connection". My problem with
upload_exception
is not that it wouldn't work but that it might be costly if the directory has lots of directories or files in it. It also crosses my mind now that listing the current directory will fail if you changed to it, but don't have read permission. I admit that's a strange case, but in the old days of ftputil actually a bug was filed that happened to come from an unreadable login directory.perhaps it would be possible to add a parameter for a maximum connection count. this way the user would be aware that there are multiple ftp connections in the background and i could set the value to one. if the code really needs to many connections, ftputil would raise an exception, suggesting to raise the maximum connection count.
I have the impression that would be more trouble than it's worth. ftputil needs additional connections to support remote file-like objects. If I limit the number of background connections, I also limit one of the core functionalities of ftputil. Maybe it's just not clear how ftputil works - every open remote file requires another FTP connection because of the way FTP works. The purpose of the additional connections for remote files is not to speed things up, for example.
i wonder if ftputil works fine with ftp server that limit the number of concurrent connections?
A very good point. I hadn't thought about this. If the FTP server allows only n connections from a client IP, you could open at most n-1 remote files at the same time. The
FTPHost
instance needs one connection, and each opened remote file needs one more connection.a third idea
why not simple reopen the connection? if it fails to reopen the connection we have a real problem (= exception), otherwise we have a short delay.
keep_alive
could be removed.Which connection should be re-opened? ;-) I guess you mean the "main" connection of the
FTPHost
instance? The other connections aren't affected byFTPHost.keep_alive
anyway, as stated in the documentation. I haven't found a way to provide keep-alive functionality for the open remote file connections.Honestly, I don't want to re-open a connection silently, without the user knowing. One reason is that the server uses timeouts to protect itself from overload, so I believe if a user wants to prevent a timeout, he/she should do it consciously (by calling
keep_alive
). Another reason to avoid re-connections behind the scenes is that ftputil would need to maintain some connection state. At the moment, I can only think of the current directory, but there may be more.Overall, I have the impression that "fixing"
keep_alive
would mean adding workaround code to a workaround (i. e.keep_alive
). If that happens, there's probably something wrong in my experience. ;-/ By the way, this reminds me of the support for hidden files (see tickets #23 and [#65](https://todo.sr.ht/~sschwarzer/ftputil/65 "#65: defect: Auto-probing of LIST's -a option is faulty (Ticket #23) (closed: fixed)")). I guess you see the similarity. :-)ps: the leaky abstraction was a interesting read, thanks!
Yes, also Joel has a funny way to write. :-)
keep_alive function
uh. i did not really think about existing code. you are right, this is not really a option.
upload_exception function and multiple connections
sorry, i did not explain my idea very well. i understand that in some cases you need multiple connections to support the remote file-like objects, but i think the code of
upload_exception
(remove the workaround and {{{keep_alive}} and just think of what it tries to accomplish) could technically work with only one connection (?)i wanted to propose to open the new connection only when it's really needed, and not with opening a remote file-like object, but i think i miss some aspect that prohibits this.
limit number of connections via parameter
one could set the default value to 0, so there is no maximum connection limitation. but you are right, this does not solve the original problem.
reconnection
uh. did not think about the connection state. but the reconnection does not have to be silent. there could be a callback that is called before a timeout error and the user may then return a new connection. but there is still the connection state problem.
Replying to ftputiluser:
upload_exception function and multiple connections
sorry, i did not explain my idea very well. i understand that in some cases you need multiple connections to support the remote file-like objects, but i think the code of
upload_exception
(remove the workaround and {{{keep_alive}} and just think of what it tries to accomplish) could technically work with only one connection (?)Each open remote file requires an FTP connection. This includes rather explicitly created remote files via
FTPHost.open
as well as remote files implicitly created byFTPHost.upload(_if_newer)
andFTPHost.download(_if_newer)
. So yourupload_exception
function will need two remote connections: one "main" connection for commands likemkdir
and another connection for a remote file opened withFTPHost.open
. On the plus side, after the remote file opened withFTPHost.open
is closed implicitly via thewith
statement, the underlying FTP connection can be re-used. (ftputil caches the underlying FTP connections of closed remote files and re-uses them for subsequently opened remote files.)FTP connections have an associated state, not only from the point of view of ftputil, but also from the point of view of the server. If one connection is used to send a command like
mkdir
, you can't use it at the same time for a file transfer. It's a bit like sending a messenger to a city in the North, and when he's half there, tell this same messenger to immediately take another message to a city in the South. For FTP connections it's actually worse. Re-using the connection won't just delay the first task, but cause a server error (like the messenger quitting his job because you gave him stupid orders ;-) ).i wanted to propose to open the new connection only when it's really needed, and not with opening a remote file-like object, but i think i miss some aspect that prohibits this.
You'll need a new FTP connection particularly for each remote file open at the same time.
reconnection
uh. did not think about the connection state. but the reconnection does not have to be silent. there could be a callback
I think it's too complicated and not worth the trouble. :-)
that is called before a timeout error
An FTP client can notice the timeout only after it has happened. Only when the client tries to use the connection again after the timeout, the client will get an error message from the server.
and the user may then return a new connection. but there is still the connection state problem.
Again, I think it doesn't reduce the complexity compared with having the client code re-establish a connection itself.
I decided not to change the
keep_alive
method, but I added a documentation section to explain why timeouts can still happen if there are open remote files (changeset 44cf845b3a2a7dab8d0eb1791caa5c4eb018011e).
I wanted to mark this ticket as "invalid" since I can't do anything to "prevent" timeouts with
keep_alive
. FTP client/server communication just doesn't work this way. Ok, in theory,keep_alive
could start a thread and get the current directory from time to time. But this wouldn't work if the "original" thread used the "main" connection at the same time. On the other hand, I could introduce a lock to prevent this and ...You see, this can get complicated easily. :-) I won't do this I think. Hence I mark the ticket as "wontfix".