~sschwarzer/ftputil#149: 
remote files randomly truncated


#Context

  • ftputil : 5.0.2
  • python : 3.10.0
  • server/client os : Windows 10 Entreprise
  • ftp server : pyftpdlib 1.5.6

(error also happens in other contexts, for example:

  • ftputil 3.x
  • python 3.6.x
  • linux/ubuntu)

#Bug description

When writing data towards server FTPS, target file is randomly truncated. No traceback, truncation is not detected by client.


#Code example

import io
import logging
import json

logging.basicConfig(level=logging.DEBUG)

import ftplib
from ftputil import FTPHost
import ftputil.session

session_factory = ftputil.session.session_factory(base_class=ftplib.FTP_TLS, port=60000)


file_name = "big_file.json"

with open(file_name) as f:
    data = json.load(f)


with FTPHost(
        "localhost",
        "user",
        "12345",
        session_factory=session_factory) as host:
    with host.open(file_name, "w") as target_f:
        host.copyfileobj(io.StringIO(json.dumps(data, indent=4)), target_f)
        # host.stat(file_name)

#Server code (if useful)

"""
An RFC-4217 asynchronous FTPS server supporting both SSL and TLS.
Requires PyOpenSSL module (http://pypi.python.org/pypi/pyOpenSSL).
"""
import logging

logging.basicConfig(level=logging.DEBUG)

from pyftpdlib.servers import FTPServer
from pyftpdlib.authorizers import DummyAuthorizer
from pyftpdlib.handlers import TLS_FTPHandler


def main():
    authorizer = DummyAuthorizer()
    authorizer.add_user('user', '12345', './ftp_files', perm='elradfmwMT')
    authorizer.add_anonymous('.')
    handler = TLS_FTPHandler
    handler.certfile = "./ssl/ssl.chain"
    handler.keyfile = "./ssl/ssl.key"
    handler.authorizer = authorizer
    handler.masquerade_address = '0.0.0.0'
    handler.passive_ports = range(60001, 65535)
    # requires SSL for both control and data channel
    handler.tls_control_required = True
    handler.tls_data_required = True
    server = FTPServer(("", 60000), handler)
    server.serve_forever()


if __name__ == '__main__':
    main()

#Ideas

  • bug only happens when using ftps, not ftp
  • bug also happens using write and not copyfileobj
  • bug also happens when directly writing file (open(file_path) as f), but much rarer then when using io.StringIO(json.dumps(...
  • using host.stat(remote_path) before closing remote file highly decreeses error rate
  • truncation size is random
Status
RESOLVED FIXED
Submitter
ftputiluser (unverified)
Assigned to
No-one
Submitted
2 years ago
Updated
2 years ago
Labels
bug library

schwa (unverified) 2 years ago · edit

Thanks a lot for the thorough report and for trying so many things to close in on the problem! :-)

This bug is probably going to be interesting, but let's see. ... ;-)

I have a few questions:

  • Were you able to experiment with another FTP server?
  • At least some of the "ideas" look as if they're actual observations, though "ideas" conveys that these items are "only" suspicions. Are all of these points observations? Are some of them suspicions instead of observations, and if yes, which ones?
  • When you say the truncation size is random, do you see truncations only on sizes that are divisible by a certain value? For example, is the size is always a multiple of the buffer size of copyfileobj?

I hope to find the time to look further into this ticket on the weekend.

schwa (unverified) 2 years ago · edit

One more question: When you say, "truncation size is random", does this refer to the very same file? Or do you mean you used different input/data and can't recognize a pattern between the original size of the data and the size you're seeing in the written remote data?

ftputiluser (unverified) 2 years ago · edit

Hello, thanks for your comments !

Sorry for my late reply, is there a way to be notified by mail of the activity of this ticket ?

To answer your questions :

  • Were you able to experiment with another FTP server? : no
  • At least some of the "ideas" look as if they're actual observations, though "ideas" conveys that these items are "only" suspicions. Are all of these points observations? Are some of them suspicions instead of observations, and if yes, which ones? : The only thing that is suspicious is FTPS : the problem never occurs using FTP. The rest of the points are observations, I shouldn't have called them ideas. I wouldn't say they are suspicious, but they make me feel that ftps connection may be closed before write is finished, under certain circumstances, depending on transfer rates and file size. However my knowledge on both FTP protocol and TLS is insufficient, so I'm realy not sure it is a good direction to investigate.
  • When you say the truncation size is random, do you see truncations only on sizes that are divisible by a certain value? For example, is the size is always a multiple of the buffer size of copyfileobj? I haven't payed attention and did not keep a copy of all truncated files. One of them is 435 KB for example.
  • When you say, "truncation size is random", does this refer to the very same file? Or do you mean you used different input/data and can't recognize a pattern between the original size of the data and the size you're seeing in the written remote data? : I mean on the same file : same operation on same file leads to random truncations.

schwa (unverified) 2 years ago · edit

Sorry for my late reply, is there a way to be notified by mail of the activity of this ticket ?

You can put yourself into the cc field when you modify the ticket, but I guess this will be readable by spammers, too. :-/

It's planned to ​move the ftputil infrastructure to ​Sourcehut sometime during the next few months. I expect the ticket situation will get better then.

Thanks for your answers. I hope I can look into this on the weekend. Probably it makes most sense if I try to reproduce the problem with your pyftpdlib code before trying to reproduce it with my local FTP/FTPS server.

One more question though: Do you see truncation if you just write a large string with remote_file.write?

You wrote something about that, but I'm not sure if you tried to leave out the JSON dump and the use of StringIO. I ask because I think we should make sure that no buffering effects of json.dumps or StringIO play a role here. (I doubt there are any, but still I suggest we should try to reproduce the problem with as few moving parts as possible.)

Another question: Did you leave the FTP server running for a bit after closing the ftputil remote file? Maybe the server buffers some data and writes it only after a while. Can you try letting the server idle ten seconds or so after the ftputil remote file is closed?

ftputiluser (unverified) 2 years ago · edit

1. using remote_file.write

The same problem occurs.

The string does not need to be very big.

This time I checked the size and here is what I get :

source file: 489 KB

  • 1st write, target file: 208 KB
  • 2nd write, target file: 224 KB
  • 3rd, 4th, 5th write, target file: 208 KB
  • 6th, 7th write: 176 KB
  • 8th write: 192 KB

...

2. leaving the server open

During all my tests, the server stays open a lot more than 10 seconds after running the test.

schwa (unverified) 2 years ago · edit

I can reproduce the problem with the following client script:

import io
import logging

logging.basicConfig(level=logging.DEBUG)

import ftplib
from ftputil import FTPHost
import ftputil.session

session_factory = ftputil.session.session_factory(base_class=ftplib.FTP_TLS, port=60000)

file_name = "my_data"

data = 489 * 1024 * "x"
data = data.encode('ascii')
print("=== data:", len(data))
# with open(file_name, 'wb') as fobj:
#     fobj.write(data)

with FTPHost(
        "localhost",
        "user",
        "12345",
        session_factory=session_factory) as host:
    with host.open(file_name, "wb") as target_f:
        host.copyfileobj(io.BytesIO(data), target_f)

I see interesting messages in the server log (printed to the screen):

DEBUG:pyftpdlib:127.0.0.1:58224-[user] <- STOR my_data
DEBUG:pyftpdlib:127.0.0.1:58224-[user] -> 125 Data connection already open. Transfer starting.
DEBUG:pyftpdlib:[debug] call: _do_ssl_handshake, err: ssl-want-read (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=0)>)
DEBUG:pyftpdlib:[debug] SSL connection established (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=0)>)
DEBUG:pyftpdlib:[debug] call: recv(), err: SysCallError(104, 'ECONNRESET') (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=229376)>)
DEBUG:pyftpdlib:[debug] call: _do_ssl_shutdown() -> os.write, err: BrokenPipeError(32, 'Broken pipe') (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=229376)>)
DEBUG:pyftpdlib:[debug] call: close() (<TLS_DTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=229376)>)
DEBUG:pyftpdlib:127.0.0.1:58224-[user] -> 226 Transfer complete.
INFO:pyftpdlib:127.0.0.1:58224-[user] STOR /.../ftp_files/my_data completed=1 bytes=229376 seconds=0.003
DEBUG:pyftpdlib:[debug] call: recv(), err: SysCallError(-1, 'Unexpected EOF') (<TLS_FTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user')>)
DEBUG:pyftpdlib:[debug] call: _do_ssl_shutdown(), shutdown completed (<TLS_FTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user')>)
DEBUG:pyftpdlib:[debug] call: close() (<TLS_FTPHandler(id=140404941031120, addr='127.0.0.1:58224', ssl=True, user='user')>)
INFO:pyftpdlib:127.0.0.1:58224-[user] FTP session closed (disconnect).
DEBUG:pyftpdlib:[debug] call: recv(), err: SysCallError(-1, 'Unexpected EOF') (<TLS_FTPHandler(id=140404957278752, addr='127.0.0.1:58222', ssl=True, user='user')>)
DEBUG:pyftpdlib:[debug] call: _do_ssl_shutdown(), shutdown completed (<TLS_FTPHandler(id=140404957278752, addr='127.0.0.1:58222', ssl=True, user='user')>)
DEBUG:pyftpdlib:[debug] call: close() (<TLS_FTPHandler(id=140404957278752, addr='127.0.0.1:58222', ssl=True, user='user')>)
INFO:pyftpdlib:127.0.0.1:58222-[user] FTP session closed (disconnect).

Note the BrokenPipeError and the most likely related SysCallError(-1, 'Unexpected EOF'). As in your case, the number of transferred bytes, here 229376, varies between runs of the client script.

I'll try to find out more.

schwa (unverified) 2 years ago · edit

With a time.sleep(1) at the end of the inner with block, i.e.

   with host.open(file_name, "wb") as target_f:                                 
        host.copyfileobj(io.BytesIO(data), target_f, max_chunk_size=sys.maxsize)
        time.sleep(1)

the file is transferred completely, but the server output still reports a broken pipe error:

DEBUG:pyftpdlib:127.0.0.1:58310-[user] <- STOR my_data
DEBUG:pyftpdlib:127.0.0.1:58310-[user] -> 125 Data connection already open. Transfer starting.
DEBUG:pyftpdlib:[debug] call: _do_ssl_handshake, err: ssl-want-read (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=0)>)
DEBUG:pyftpdlib:[debug] SSL connection established (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=0)>)
DEBUG:pyftpdlib:[debug] call: recv(), err: SysCallError(104, 'ECONNRESET') (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=500736)>)
DEBUG:pyftpdlib:[debug] call: _do_ssl_shutdown() -> os.write, err: BrokenPipeError(32, 'Broken pipe') (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=500736)>)
DEBUG:pyftpdlib:[debug] call: close() (<TLS_DTPHandler(id=140407016892112, addr='127.0.0.1:58310', ssl=True, user='user', ssl-data=True, sending-file=<_io.BufferedWriter name='/.../ftp_files/my_data'>, bytes-trans=500736)>)
DEBUG:pyftpdlib:127.0.0.1:58310-[user] -> 226 Transfer complete.
INFO:pyftpdlib:127.0.0.1:58310-[user] STOR /.../ftp_files/my_data completed=1 bytes=500736 seconds=1.005

For the record, a time.sleep after the inner with block, but inside the outer with block, doesn't seem to help though.

I have a suspicion what's going on, but need to look into this more.

schwa (unverified) 2 years ago · edit

I think I fixed the problem. Would you please clone/pull the latest change from ​https://git.sr.ht/~sschwarzer/ftputil and verify whether the change also works for you and report back?

schwa (unverified) 2 years ago · edit

Commit message:

Unwrap `FTPFile` TLS socket before closing it

Without the unwrapping, there may be data "stuck" in the TLS wrapper, so
the underlying socket isn't "flushed" when closing the socket with
`self._conn.close()`.

This problem was raised in ticket <a href="/~sschwarzer/ftputil/149" title="~sschwarzer/ftputil#149: remote files randomly truncated">#149</a>. Probably it wasn't noticed
earlier, since waiting a short time (one second or even less; where
"waiting" could mean doing anything else that takes a moment) between
the last write to the socket and closing the `FTPFile` seems to transmit
all data. That said, pyftpdlib still reports a broken pipe error in this
case.

I found the solution when checking the current ftplib source code for
`FTP.storbinary`.

With this commit, the workaround of waiting before closing the `FTPFile`
is no longer needed and the broken pipe error in the pyftpdlib log is
gone.

ticket: 149

ftputiluser (unverified) 2 years ago · edit

Hello, I confirm it works for me !

Well done and thanks for your reactivity.

And by the way, congratulations and thanks for all your work on ftputil !

schwa (unverified) 2 years ago · edit

I'm glad to hear the change works for you! :-) I'll probably release a new ftputil version on Wednesday.

Well done and thanks for your reactivity. And by the way, congratulations and thanks for all your work on ftputil !

Thank you, such compliments are always nice to get. :-)

Register here or Log in to comment, or comment via email.