Inbound: correctly parse long (folded) headers in raw MIME messages

Work around Python 2 email.parser.Parser bug handling RFC5322 folded
headers. Fixes problems where long headers in inbound mail (e.g.,
Subject) get truncated or have unexpected spaces.

This change also updates AnymailInboundMessage.parse_raw_mime to use
the improved "default" email.policy on Python 3 (rather than the
default "compat32" policy). This likely fixes several other parsing
bugs that will still affect code running on Python 2.

Improves inbound parsing for all ESPs that provide raw MIME email.
(Mailgun, Mandrill, SendGrid, SparkPost)
This commit is contained in:
medmunds
2018-03-23 16:56:45 -07:00
parent 0c3e3e9bad
commit 70094cf3bc
2 changed files with 70 additions and 26 deletions

View File

@@ -1,3 +1,4 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from base64 import b64encode
@@ -387,3 +388,28 @@ class AnymailInboundMessageAttachedMessageTests(SimpleTestCase):
self.assertIsInstance(orig_msg, AnymailInboundMessage)
self.assertEqual(orig_msg['Subject'], "Original message")
self.assertEqual(orig_msg.get_content_type(), "multipart/related")
class EmailParserWorkaroundTests(SimpleTestCase):
# Anymail includes workarounds for (some of) the more problematic bugs
# in the Python 2 email.parser.Parser.
def test_parse_folded_headers(self):
raw = dedent("""\
Content-Type: text/plain
Subject: This subject uses
header folding
X-Json: {"problematic":
["encoded newline\\n",
"comma,semi;no space"]}
Not-A-Header: This is the body.
It is not folded.
""")
msg = AnymailInboundMessage.parse_raw_mime(raw)
self.assertEqual(msg['Subject'], "This subject uses header folding")
self.assertEqual(msg["X-Json"],
'{"problematic": ["encoded newline\\n", "comma,semi;no space"]}')
self.assertEqual(msg.get_content_text(),
"Not-A-Header: This is the body.\n It is not folded.\n")
self.assertEqual(msg.defects, [])