243 | | Directory and file names |
244 | | ------------------------ |
245 | | |
246 | | .. note:: |
247 | | |
248 | | Keep in mind that this section only applies to directory and file |
249 | | *names*, not file *contents*. Encoding and decoding for file |
250 | | contents is handled by the ``encoding`` argument for |
251 | | `FTPHost.open`_. |
252 | | |
253 | | First off: If your directory and file names (both as arguments and on |
254 | | the server) contain only ISO 8859-1 (latin-1) characters, you can use |
255 | | such names in the form of ``bytes`` or ``str`` objects. However, you |
256 | | can't mix different string types (``bytes`` and ``str``) in one call |
257 | | (for example in ``FTPHost.path.join``). |
258 | | |
259 | | If you have directory or file names with characters that aren't in |
260 | | latin-1, it's recommended to use ``bytes`` objects. In that case, |
261 | | returned paths will be ``bytes`` objects, too. |
262 | | |
263 | | Read on for details. |
264 | | |
265 | | .. note:: |
266 | | |
267 | | The approach described below may look awkward and in a way it is. |
268 | | The intention of ``ftputil`` is to behave like the local file |
269 | | system APIs of Python 3 as far as it makes sense. Moreover, the |
270 | | taken approach makes sure that directory and file names that were |
271 | | used with Python 3's native ``ftplib`` module will be compatible |
272 | | with ``ftputil`` and vice versa. Otherwise you may be able to use a |
273 | | file name with ``ftputil``, but get an exception when trying to |
274 | | read the same file with Python 3's ``ftplib`` module. |
275 | | |
276 | | Methods that take paths of directories and/or files can take either |
277 | | ``bytes`` or ``str`` objects, or `PathLike`_ objects that can be |
278 | | converted to ``bytes`` or ``str``. |
279 | | |
280 | | .. _PathLike: https://docs.python.org/3/library/os.html#os.PathLike |
281 | | |
282 | | If a method gets a string argument (or a string argument wrapped in a |
283 | | PathLike_ object) and returns one or more strings, these strings will |
284 | | have the same string type (``bytes`` or ``str``) as the argument(s). |
285 | | Mixing different string types in one call (for example in |
286 | | ``FTPHost.path.join``) isn't allowed and will cause a ``TypeError``. |
287 | | These rules are the same as for local file system operations in Python 3. |
288 | | |
289 | | ``bytes`` objects for directory and file names will be sent to the |
290 | | server as-is. On the other hand, ``str`` objects will be encoded to |
291 | | ``bytes`` objects, assuming latin-1 encoding. This implies that such |
292 | | ``str`` objects must only contain code points 0-255 for the latin-1 |
293 | | character set. Using any other characters will result in a |
294 | | ``UnicodeEncodeError`` exception. |
295 | | |
296 | | If you have directory or file names as ``str`` objects with |
297 | | non-latin-1 characters, encode the strings to ``bytes`` yourself, |
298 | | using the encoding you know the server uses for its file system. |
299 | | Decode received paths with the same encoding. Encapsulate these |
300 | | conversions as far as you can. Otherwise, you'd have to adapt |
301 | | potentially a lot of code if the server encoding changes. |
302 | | |
303 | | If you *don't* know the encoding on the server side, it's probably the |
304 | | best to only use ``bytes`` for directory and file names. That said, as |
305 | | soon as you *show* the names to a user, you -- or the library you use |
306 | | for displaying the names -- has to guess an encoding. |
307 | | |
308 | | If you can decide about paths yourself, it's generally safest to use |
309 | | only ASCII characters in FTP paths. |
310 | | |
311 | | |
| 364 | - ``encoding`` can be a string to set the encoding of directory and |
| 365 | file paths on the remote server. (This has nothing to do with the |
| 366 | encoding of file contents!) If you pass a string and your base class |
| 367 | is neither ``ftplib.FTP`` nor ``ftplib.FTP_TLS``, the used heuristic |
| 368 | in ``session_factory`` may not work reliably. Therefore, if in |
| 369 | doubt, let ``encoding`` be ``None`` and define your ``base_class`` |
| 370 | so that it sets the encoding you want. |
| 371 | |
| 372 | Note: In Python 3.9, the default path encoding for ``ftplib.FTP`` |
| 373 | and ``ftplib.FTP_TLS`` changed from previously "latin-1" to "utf-8". |
| 374 | Hence, if you don't pass an ``encoding`` to ``session_factory``, |
| 375 | you'll get different path encodings for Python 3.8 and earlier vs. |
| 376 | Python 3.9 and later. |
| 377 | |
| 378 | If you're sure that you always use only ASCII characters in your |
| 379 | remote paths, you don't need to worry about the path encoding and |
| 380 | don't need to use the ``encoding`` argument. |
| 381 | |
455 | | as described at the start of this section. However, the class |
456 | | ``M2Crypto.ftpslib.FTP_TLS`` has a limitation so that you can't use |
457 | | it with ftputil out of the box. The function ``session_factory`` |
458 | | contains a workaround for this limitation. For details refer to `this |
459 | | ticket`_. |
460 | | |
461 | | .. _`this ticket`: https://ftputil.sschwarzer.net/trac/ticket/78 |
| 415 | as described at the start of this section. |
| 416 | |
| 417 | |
| 418 | Directory and file names |
| 419 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 420 | |
| 421 | .. note:: |
| 422 | |
| 423 | Keep in mind that this section only applies to directory and file |
| 424 | *names*, not file *contents*. Encoding and decoding for file |
| 425 | contents is handled by the ``encoding`` argument for |
| 426 | `FTPHost.open`_. |
| 427 | |
| 428 | Generally, paths can be ``str`` or ``bytes`` objects (or `PathLike`_ |
| 429 | objects wrapping ``str`` or ``bytes``). However, you can't mix |
| 430 | different string types (``bytes`` and ``str``) in one call (for |
| 431 | example in ``FTPHost.path.join``). If a method gets a string argument |
| 432 | (or a string argument wrapped in a PathLike_ object) and returns one |
| 433 | or more strings, these strings will have the same string type |
| 434 | (``bytes`` or ``str``) as the argument(s). Mixing different string |
| 435 | types in one call (for example in ``FTPHost.path.join``) isn't allowed |
| 436 | and will cause a ``TypeError``. These rules are the same as for local |
| 437 | file system operations. |
| 438 | |
| 439 | .. _PathLike: https://docs.python.org/3/library/os.html#os.PathLike |
| 440 | |
| 441 | Although you can pass paths as ``str`` or ``bytes``, the former is |
| 442 | recommended. See below for the reason. |
| 443 | |
| 444 | *If* you have directory or file names with non-ASCII characters, you |
| 445 | need to be aware of the encoding the `session factory`_ (e. g. |
| 446 | ``ftplib.FTP``) uses. This needs to be the same encoding that the FTP |
| 447 | server uses for the paths. |
| 448 | |
| 449 | The following diagram shows string conversions on the way from your |
| 450 | code to the remote FTP server. The opposite way works analogously, so |
| 451 | encoding steps in the diagram become decoding steps and decoding steps |
| 452 | in the diagram become encoding steps. |
| 453 | |
| 454 | Both "branching points" in the upper and lower part of diagrams are |
| 455 | independent, so depending on how you pass paths to ftputil and which |
| 456 | file system API the FTP server uses, there are four possible |
| 457 | combinations. |
| 458 | |
| 459 | :: |
| 460 | |
| 461 | +-----------+ +-----------+ |
| 462 | | Your code | | Your code | |
| 463 | +-----------+ +-----------+ |
| 464 | | | |
| 465 | | str | bytes |
| 466 | v v |
| 467 | +-------------+ +-------------+ decode with encoding of session, |
| 468 | | ftputil API | | ftputil API | e. g. `ftplib.FTP` instance |
| 469 | +-------------+ +-------------+ |
| 470 | \ / |
| 471 | \ str / |
| 472 | v v |
| 473 | +---------------+ encode with encoding |
| 474 | | ftplib API | specified in `FTP` instance |
| 475 | +---------------+ |
| 476 | | |
| 477 | | bytes |
| 478 | v |
| 479 | +-------------+ |
| 480 | | socket API | |
| 481 | +-------------+ |
| 482 | / \ |
| 483 | / \ local / client |
| 484 | - - - - - / - - - - - \ - - - - - - - - - - - - - - - - - - - - - - |
| 485 | / \ remote / server |
| 486 | / bytes \ |
| 487 | v v |
| 488 | +------------+ +------------+ decode with encoding from |
| 489 | | FTP server | | FTP server | FTP server configuration |
| 490 | +------------+ +------------+ |
| 491 | | | |
| 492 | | bytes | str |
| 493 | v v |
| 494 | +-------------+ +-------------+ |
| 495 | | remote file | | remote file | |
| 496 | | system API | | system API | |
| 497 | +-------------+ +-------------+ |
| 498 | \ / |
| 499 | \ bytes / |
| 500 | v v |
| 501 | +-------------------+ |
| 502 | | file system | |
| 503 | +-------------------+ |
| 504 | |
| 505 | As you can see at the top of the diagram, if you use ``str`` objects |
| 506 | (regular unicode strings), there's one fewer decoding step, and so one |
| 507 | fewer source of problems. If you use ``bytes`` objects for paths, |
| 508 | ftputil tries to get the encoding for the FTP server from the |
| 509 | ``encoding`` attribute of the session instance (say, an instance of |
| 510 | ``ftplib.FTP``). If no ``encoding`` attribute is present, a |
| 511 | ``NoEncodingError`` is raised. |
| 512 | |
| 513 | All encoding/decoding steps must use the same encoding, the encoding |
| 514 | the server uses (at the bottom of the diagram). If the server uses the |
| 515 | bytes from the socket directly, i. e. without an encoding step, you |
| 516 | have to use the file system encoding. |
| 517 | |
| 518 | Until and including Python 3.8, the encoding implicitly assumed by |
| 519 | the ``ftplib`` module was latin-1, so using ``bytes`` was the safest |
| 520 | strategy. However, Python 3.9 made the ``encoding`` |
| 521 | configurable via an ``ftplib.FTP`` constructor argument ``encoding``, |
| 522 | *but defaults to UTF-8*. |
| 523 | |
| 524 | If you don't pass a `session factory`_ to the ``ftputil.FTPHost`` |
| 525 | constructor, ftputil will use latin-1 encoding for the paths. This is |
| 526 | the same value as in earlier ftputil versions in combination with |
| 527 | Python 3.8 and earlier. |
| 528 | |
| 529 | Summary: |
| 530 | |
| 531 | - If possible, use only ASCII characters in paths. |
| 532 | - If possible, pass paths to ftputil as ``str``, not ``bytes``. |
| 533 | - If you use a custom session factory, the session instances created |
| 534 | by the factory must have an ``encoding`` attribute with the name of |
| 535 | the path encoding to use. If your session instances don't have an |
| 536 | ``encoding`` attribute, ftputil raises a ``NoEncodingError`` when |
| 537 | the session is created. |
| 538 | |