mirror of
https://github.com/pacnpal/django-anymail.git
synced 2025-12-20 03:41:05 -05:00
Inbound: fix charset handling in .text, .html, .get_content_text()
Make `AnymailInboundMessage.text`, `.html` and `.get_content_text()` usually do the right thing for non-UTF-8 messages/attachments. Fixes an incorrect UnicodeDecodeError when receiving an (e.g.,) ISO-8859-1 encoded message, and improves handling for inbound messages that were not properly encoded by the sender. * Decode using the message's (or attachments's) declared charset by default (rather than always defaulting to 'utf-8'; you can still override with `get_content_text(charset=...)` * Add `errors` param to `get_content_text()`, defaulting to 'replace'. Mis-encoded messages will now use the Unicode replacement character rather than raising errors. (Use `get_content_text(errors='strict')` for the previous behavior.)
This commit is contained in:
@@ -363,11 +363,22 @@ have these methods:
|
||||
(Anymail back-ports Python 3.5's :meth:`~email.message.Message.get_content_disposition`
|
||||
method to all supported versions.)
|
||||
|
||||
.. method:: get_content_text(charset='utf-8')
|
||||
.. method:: get_content_text(charset=None, errors='replace')
|
||||
|
||||
Returns the content of the attachment decoded to a `str` in the given charset.
|
||||
Returns the content of the attachment decoded to Unicode text.
|
||||
(This is generally only appropriate for text or message-type attachments.)
|
||||
|
||||
If provided, charset will override the attachment's declared charset. (This can be useful
|
||||
if you know the attachment's :mailheader:`Content-Type` has a missing or incorrect charset.)
|
||||
|
||||
The errors param is as in :meth:`~bytes.decode`. The default "replace" substitutes the
|
||||
Unicode "replacement character" for any illegal characters in the text.
|
||||
|
||||
.. versionchanged:: 2.1
|
||||
|
||||
Changed to use attachment's declared charset by default,
|
||||
and added errors option defaulting to replace.
|
||||
|
||||
.. method:: get_content_bytes()
|
||||
|
||||
Returns the raw content of the attachment as bytes. (This will automatically decode
|
||||
|
||||
Reference in New Issue
Block a user