Ignore:
Timestamp:
Dec 22, 2015, 5:23:41 PM (6 years ago)
Author:
Stefan Schwarzer <sschwarzer@…>
Branch:
default
amend_source:
1eb20dc205d030e6c2d16175ec4abee8c320eb27
Message:
Correct and expand section "Directory and file names"

The previous text assumed that `ftputil` would implicitly use the
encoding from `locale.getpreferredencoding`. This is wrong. `ftputil`
uses `ftplib` and (on Python 3) `ftplib` implicitly always uses
latin-1 encoding.
File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/ftputil.txt

    r1593 r1613  
    244244------------------------
    245245
    246 Methods that take names of directories and files can take either byte
    247 strings (``str`` on Python 2, ``bytes`` on Python 3) or unicode
    248 strings (``unicode`` on Python 2, ``str`` on Python 3).
    249 
    250 Byte strings will be sent to the FTP server as-is. Unicode strings
    251 will be encoded with the encoding returned from
    252 ``locale.getpreferredencoding``. This is the same semantics as for
    253 locally used names in Python 3.
    254 
    255 Methods that take and return a directory or file name will return the
    256 same string type as they're given. For example, if the argument to
    257 ``FTPHost.path.abspath`` is a byte string, you'll get a byte string
    258 back. This behavior is the same as for the local file system API in
    259 Python 2 and 3.
     246.. note::
     247
     248   Keep in mind that this section only applies to directory and file
     249   *names*, not file *contents*. Encoding and decoding for file
     250   contents is handled by the ``encoding`` argument for
     251   `FTPHost.open`_.
     252
     253First off: If your directory and file names (both as
     254arguments and on the server) contain only ISO 8859-1 (latin-1)
     255characters, you can use such names in the form of byte strings or
     256unicode strings. However, you can't mix different string types (bytes
     257and unicode) in one call (for example in ``FTPHost.path.join``).
     258
     259If you have directory or file names with characters that aren't in
     260latin-1, it's recommended to use byte strings. In that case,
     261returned paths will be byte strings, too.
     262
     263Read on for details.
     264
     265.. note::
     266
     267   The approach described below may look awkward and in a way it is.
     268   The intention of ``ftputil`` is to behave like the local file
     269   system APIs of Python 3 as far as it makes sense. Moreover, the
     270   taken approach makes sure that directory and file names that were
     271   used with Python 3's native ``ftplib`` module will be compatible
     272   with ``ftputil`` and vice versa. Otherwise you may be able to use a
     273   file name with ``ftputil``, but get an exception when trying to
     274   read the same file with Python 3's ``ftplib`` module.
     275
     276Methods that take names of directories and/or files can take either
     277byte or unicode strings. If a method got a string argument and returns
     278one or more strings, these strings will have the same string type as
     279the argument(s). Mixing different string arguments in one call (for
     280example in ``FTPHost.path.join``) isn't allowed and will cause a
     281``TypeError``. These rules are the same as for local file system
     282operations in Python 3. Since ``ftputil`` uses the same API for Python
     2832, ``ftputil`` will do the same when run on Python 2.
     284
     285Byte strings for directory and file names will be sent to the server
     286as-is. On the other hand, unicode strings will be encoded to byte
     287strings, assuming latin-1 encoding. This implies that such unicode
     288strings must only contain code points 0-255 for the latin-1 character
     289set. Using any other characters will result in a
     290``UnicodeEncodeError`` exception.
     291
     292If you have directory or file names as unicode strings with non-latin-1
     293characters, encode the unicode strings to byte strings yourself, using
     294the encoding you know the server uses. Decode received paths with the
     295same encoding. Encapsulate these conversions as far as you can.
     296Otherwise, you'd have to adapt potentially a lot of code if the server
     297encoding changes.
     298
     299If you *don't* know the encoding on the server side,
     300it's probably the best to only use byte strings for directory and file
     301names. That said, as soon as you *show* the names to a user, you -- or
     302the library you use for displaying the names -- has to guess an
     303encoding.
    260304
    261305
     
    10711115
    10721116
     1117.. _`FTPHost.open`:
     1118
    10731119File-like objects
    10741120-----------------
Note: See TracChangeset for help on using the changeset viewer.