Make `AnymailInboundMessage.text`, `.html` and `.get_content_text()`
usually do the right thing for non-UTF-8 messages/attachments. Fixes
an incorrect UnicodeDecodeError when receiving an (e.g.,) ISO-8859-1
encoded message, and improves handling for inbound messages that were
not properly encoded by the sender.
* Decode using the message's (or attachments's) declared charset
by default (rather than always defaulting to 'utf-8'; you can
still override with `get_content_text(charset=...)`
* Add `errors` param to `get_content_text()`, defaulting to 'replace'.
Mis-encoded messages will now use the Unicode replacement character
rather than raising errors. (Use `get_content_text(errors='strict')`
for the previous behavior.)
In AnymailInboundMessage, work around Python 2 email.parser.Parser's
lack of handling for RFC2047-encoded email headers. (The Python 3 email
package already decodes these automatically.)
Improves inbound handling on Python 2 for all ESPs that provide raw
MIME email or raw headers with inbound events. (Mailgun, Mandrill,
SendGrid, SparkPost.)
Work around Python 2 email.parser.Parser bug handling RFC5322 folded
headers. Fixes problems where long headers in inbound mail (e.g.,
Subject) get truncated or have unexpected spaces.
This change also updates AnymailInboundMessage.parse_raw_mime to use
the improved "default" email.policy on Python 3 (rather than the
default "compat32" policy). This likely fixes several other parsing
bugs that will still affect code running on Python 2.
Improves inbound parsing for all ESPs that provide raw MIME email.
(Mailgun, Mandrill, SendGrid, SparkPost)