Opened 13 years ago

Closed 13 years ago

#13 closed defect (worksforme)

Small files not always transferred successfully

Reported by: paschott@… Owned by: schwa
Priority: major Milestone: 2.1
Component: Library Version: 2.0.3
Keywords: small files, original and copy differ Cc:

Description

Hard to be more specific with this. I'm transferring a lot of PGP-encrypted files between servers and run into this with files in the 3-6 KB size range on occasion - enough to be annoying as I have to request the file sent a different way.

I don't know if the file is just not copied bit for bit or if something is added/left out as I can't easily trace what's going on within FTPUtil. I'm willing to work with this if there's a way to generate some debug information.

Running Python 2.4.1 for the backend w/ latest release of FTPUtil as far as I can tell.

-Pete Schott

Attachments (2)

random_data (5.0 KB) - added by schwa 13 years ago.
5120 bytes of pseudo-random data
upload_download_test.py (3.1 KB) - added by schwa 13 years ago.
Test script for uploads/downloads

Download all attachments as: .zip

Change History (8)

comment:1 Changed 13 years ago by schwa

Keywords: small files original and copy differ added
Milestone: 2.1
Status: newassigned

Things that may help to find a solution for the problem:

  • What do you mean with "not successful"? Are the copies of your files much shorter the originals? Do you get a traceback (if yes, post it please)?
  • Which Python statements are you using for the transfer?
  • Are the PGP files sensitive to changes of line ending characters or are they in an encoded format where the type of line endings (CRLF/LF/CR) doesn't matter?
  • Are you using binary transfer mode? The default is ASCII/text transfer mode, because that's also Python's default when opening files.
  • What is the difference in length between the original file/s and the copy/ies?
  • Can you observe a pattern, for example:
    • Is the first wrong byte always at the same position?
    • What byte values are in the "neighborhood" of the wrong bytes?

If the above doesn't help to spot the error, could you attach an example of a problematic file and the result of the errornous transfer to this ticket? (no confidential/secret information, of course)

comment:2 Changed 13 years ago by paschott@…

Can't attach the files as they are sensitive and they wouldn't help much as they're corrupted PGP encrypted files. I don't know where the corruption exists as PGP isn't too forthcoming with errors.

Most of the time, the files transfer without any issues, but sometimes a couple of files just won't transfer appropriately. If I use DOS FTP (or some non-Python FTP), I can get the file without problems, but my process usually does: Get Files, If successful, remove remote copy. Code to follow.

I wish I could observe a pattern, but this just doesn't happen with regularity or anyway I can reproduce. I'll try to ask for some help by getting them to queue up "problem" files in a test folder.

As far as I can tell, there is no difference in file size anywhere. Once I get some of these queued up for testing, I'll do some further testing of the files. No tracebacks or exceptions at any point - would have sent those on straight away. :-)

Here's some code snippets:

objFTP = ftputil.FTPHost(FTPHostInput, FTPUserInput, FTPPasswordInput) objFTP.chdir("/ksdata_out") #Folder storing the OK files. try:

FTPRemoteList = objFTP.listdir("/ksdata_in")

except:

FTPRemoteList = [] print "No files...."

for filename in FTPRemoteList: ##This is actually a sublist to filter out folders and such, but you get the idea

try:

objFTP.download(filename, filename, "b") print filename print "Remote size: " + str(objFTP.stat(filename)[6]) print "Local size: " + str(os.stat(os.path.join(DownloadedPathInput?, filename))[6])

time.sleep(5) for filename in [filename for filename in FTPRemoteList if (os.path.splitext(filename)[1] in [".sem",".SEM",".PGP",".pgp"])]:

if os.path.isfile(filename) and os.stat(os.path.join(DownloadedPathInput?, filename))[6] == objFTP.stat(filename)[6]:

print "Deleting " + filename + " from remote site." objFTP.remove(filename)

#Close the FTP connection. objFTP.close()

Changed 13 years ago by schwa

Attachment: random_data added

5120 bytes of pseudo-random data

comment:3 Changed 13 years ago by schwa

Pete,

I added an attachment with 5 KB (5120 bytes) of pseudo-random data (from /dev/urandom). Please download it several (e. g. 10-20) times and attach a zip or tar.gz file of the so-many downloaded files to this ticket. I'll then look at the downloaded files and see if I find any patterns.

Hint: If you post code, enclose it in triple braces, like this:

{{{
some code
}}}

(I've repeated the braces in the input in the comment field to have them shown in the wiki markup.)

Changed 13 years ago by schwa

Attachment: upload_download_test.py added

Test script for uploads/downloads

comment:4 Changed 13 years ago by schwa

Pete,

I attached a script "upload_download_test.py". It generates pseudo-random data und uploads and downloads it to/from an FTP server. In 500 runs on my local pure-FTPd server, it didn't cause an error. Could you please run it against your FTP server and report the results here?

In your case, could it be that the cause is a too high load on the server you connect to (for example, causing a race condition)? Is the code you showed in this ticket in a multithreaded application; i. e. are the file objects used concurrently (perhaps accidentally)?

comment:5 Changed 13 years ago by paschott@…

I will run the script later today. As for the load, I don't think it should be too high. I doubt we're processing more than 200 files at any given time (1/2 of those are triggers, the other 1/2 are PGP encrypted ZIP files). The files should not be used concurrently. I'm running this through Windows Task Scheduler every 10 minutes and nothing else runs the code that I'm aware of. I can see how that could cause problems, though.

Let me post back the results and see what I get.

Thanks for your help.

-Pete Schott

comment:6 Changed 13 years ago by schwa

Resolution: worksforme
Status: assignedclosed
Note: See TracTickets for help on using tickets.