diff --git a/spec/fixtures/emails/email_18_stripped.html b/spec/fixtures/emails/email_18_stripped.html index 6a082a391..bed15f86f 100644 --- a/spec/fixtures/emails/email_18_stripped.html +++ b/spec/fixtures/emails/email_18_stripped.html @@ -16,453 +16,5 @@

-
On Thu, Mar 3, 2016 at 3:19 AM, Nylas <test@nylas.com> wrote:
-
-
Hey Recipient, -
-

-
-
-
Checking in -- will you guys be needing to test with 10+ accounts soon?
- -
-

-
-
-
Best,
-
Nylas
-
-
-- 
-
-
-
-
-
-
-
-
-
-
-
-
-
Test Sender
-
-
Head of Business Development and Growth
-
Nylas Inc.
-

-
- -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- -
-
-
-On Feb 10 2016, at 3:28 am, Recipient Name <email.name@nylas.com> wrote: -
-
Fantastic! Thank you, Nylas. -
-

-
-
-
Have a good day,
-
Recipient
-
-
-

-
-
On Wed, Feb 10, 2016 at 1:27 AM, Test Sender <test@nylas.com> wrote:
-
-
Hi Recipient, -
-
-

-
-
-
CONTENT 4
-
-

-
-
-
CONTENT 5
- -
-

-
-
-
Best,
-
Nylas
-
-

-
-
-
-- 
-
-
-
-
-
-
-
-
-
-
-
-
-
Test Sender
-
-
Head of Business Development and Growth
-
Nylas Inc.
-
num
-
- -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-On Feb 9 2016, at 12:49 am, Recipient Name <email.name@nylas.com> wrote: -
-
Hi Nylas, -
-

-
-
-
Content 1
-
-Content 2
-
-Regards,
-Recipient
-
-
-

-
-
On Tue, Feb 9, 2016 at 1:37 AM, Test Sender <test@nylas.com> wrote:
-
-
Thanks APerson! -
-
-

-
-
-
Content 3 -
- -
-

-
-
-
Best,
-
Nylas
-
-
-- 
-
-
-
-
-
-
-
-
-
-
-
-
-
Test Sender
-
-
Head of Business Development and Growth
-
Nylas Inc.
-
num
-
- -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-On Feb 8 2016, at 3:33 pm, Another Person <another.email@nylas.com> wrote: -
-
CONTENT 8 -
-

-
-
-
Regards,
-
APerson
-
-

-
-
-
-
-
-
-
-
-
-
-
Another Person
-
Co-founder & President of Pipedrive
- - - -
-
-
-
-
-
-
-
-
-
-
On Mon, Feb 8, 2016 at 2:27 PM, Test Sender <test@nylas.com> wrote:
-
-
Hi APerson, -
-
-

-
-
-
CONTENT 9
-
-

-
-
-
CONTENT 10
-
-

-
-
-
Best,
-
-

-
-
-
Nylas
- -
-

-
-
-
-
-- 
-
-
-
-
-
-
-
-
-
-
-
-
-
Test Sender
-
-
Head of Business Development and Growth
-
Nylas Inc.
-
num
-
- -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-On Feb 8 2016, at 1:40 pm, Another Person <another.email@nylas.com> wrote: -
-
Hey Nylas, -
-

-
-
-
CONTENT 11
-
-

-
-
-
CONTENT 12
-
-
-
CONTENT 13
-
-

-
-CONTENT 14
-
-
-

-
-
-
CONTENT 15
-
-

-
-
-
Regards,
-
APerson 
-
-
-

-
-
-
-
-
-
-
-
-
-
-
Another Person
-
Co-founder & President of Place
- - - -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-

-
-
---
-
-
-
-
Test Person
-
Product Manager | Pipedrive
-
@testtwitter
-
-
-
-
-
-
-
-
-
-
-
-
-
-

-
-
---
-
-
-
-
Test Person
-
Product Manager | Pipedrive
-
@testtwitter
-
-
-
-
-
-
-
-
-
-
-
-
-

-
---
-
-
-
-
Test Person
-
Product Manager | Pipedrive
-
@testtwitter
-
-
-
-
- - - \ No newline at end of file + \ No newline at end of file diff --git a/spec/fixtures/emails/email_20_stripped.html b/spec/fixtures/emails/email_20_stripped.html index de5fdb29b..cd6e63f47 100644 --- a/spec/fixtures/emails/email_20_stripped.html +++ b/spec/fixtures/emails/email_20_stripped.html @@ -1,2 +1 @@ -
Yaaay!  So excited :)  And no worries, see you in PR, if not before





-
+
Yaaay!  So excited :)  And no worries, see you in PR, if not before

\ No newline at end of file diff --git a/spec/fixtures/emails/email_23.html b/spec/fixtures/emails/email_23.html new file mode 100644 index 000000000..68db4e81d --- /dev/null +++ b/spec/fixtures/emails/email_23.html @@ -0,0 +1,484 @@ +
Hi,

Thank you

Text
Text

Thank you again,
Name

On Thu, Nov 3, 2016 at 3:34 PM, Evan Morikawa <evan@nylas.com> wrote:
+ + + +
+Name, +
+

+
+
Text A
+

+
+
Text B
+

+
+
Text C
+

+
+
Text D
+

+
+
Best,
+
Evan
+ +
+
+

+On Nov 1 2016, at 11:21 am, Halla Moore <Halla@nylas.com> wrote:
+
+ +Hi Name, +
+

+
+
Nice to e-meet you too! :)
+
+

+
+
Speak to you soon,
+
Halla
+

+On Nov 1 2016, at 11:18 am, Last, Name <name_wu@brown.edu> wrote:
+
+ +
+
Hi Evan and Halla,
+

+
+
Halla -- nice to e-meet you! I'm looking forward to speaking. :)
+

+
+
Thank you,
+
Name
+
+

+
On Tue, Nov 1, 2016 at 1:42 PM, Evan Morikawa <evan@nylas.com> wrote:
+
+
Name, +

+
+
Text J
+

+
+
You should also get a calendar invite shortly.
+

+
+
Evan
+ +
+
+

+On Oct 31 2016, at 5:22 pm, Last, Name <name_wu@brown.edu> wrote: +
+
+
+
Hi Evan,
+

+
+
Yes, that works! Looking forward to it.
+

+
+
Name
+
+

+
On Mon, Oct 31, 2016 at 2:49 PM, Evan Morikawa <evan@nylas.com> wrote:
+
+
Name, how about tomorrow, Tuesday 11/1 at 7:00pm EDT (4:00pm PDT)?
+
+Evan +
+
+

+On Oct 28 2016, at 10:16 am, Last, Name <name_wu@brown.edu> wrote: +
+
+
+
Hi Evan,
+

+
+
Thank you for your patience; I've been recovering from a cold this week.
+

+
+
Text H
+

+
+
Thank you very much,
+
Name
+

+
On Tue, Oct 25, 2016 at 11:51 AM, Evan Morikawa <evan@nylas.com> wrote:
+
+
Name, +

+
+
Text I
+

+
+
Text J
+

+
+
Text K
+

+
+
Really looking forward to chatting more soon.
+

+
+
Evan
+ +
+
+

+On Oct 24 2016, at 11:54 pm, Last, Name <name_wu@brown.edu> wrote: +
+
+
+
Hello Evan,
+

+
+
Text L
+

+
+
Cheers,
+
Name
+
+

+
On Wed, Oct 19, 2016 at 5:59 PM, Last, Name <name_wu@brown.edu> wrote:
+
+
Hi Evan, +

+
+
Text M
+ +

+
+
Name
+
+
+
+

+
On Wed, Oct 19, 2016 at 12:40 PM, Evan Morikawa <evan@nylas.com> wrote:
+
+
Great! +
+

+
+
I just sent over a calendar invite. Join me on this Google Hangout: https://plus.google.com/hangouts/_/nylas.com/evan-julia this + Friday.
+

+
+
Talk soon
+ +
Evan
+ +
+Sent from +Nylas N1, the extensible, open source mail client.
+ +
+
+

+On Oct 18 2016, at 7:21 pm, Last, Name <name_wu@brown.edu> wrote: +
+
+
Hi Evan, +

+
+
Sounds great! My number is 555-5555.
+

+
+
Looking forward to speaking,
+
Name
+
+

+
On Tue, Oct 18, 2016 at 4:12 PM, Evan Morikawa <evan@nylas.com> wrote:
+
+
Name, (Michael to bcc), +
+

+
+
Friday works. How about 30 min this Friday 10/21 at 11:30am PDT (2:30pm EDT)?
+

+
+
Evan
+
+Sent from +Nylas N1, the extensible, open source mail client.
+ +
+
+

+On Oct 18 2016, at 2:03 pm, Last, Name <name_wu@brown.edu> wrote: +
+
+
Hi Evan, +

+
+
Text N
+
Text O
+

+
+
Thanks a lot,
+
Name
+

+
On Mon, Oct 17, 2016 at 1:24 PM, Evan Morikawa <evan@nylas.com> wrote:
+
+
Juila, +
+

+
+
Text P
+

+
+
Have time early this week?
+

+
+
Evan
+ + +
+Sent from +Nylas N1, the extensible, open source mail client.
+ +
+
+

+On Oct 7 2016, at 9:17 am, Last, Name <name_wu@brown.edu> wrote: +
+
+
Hi Evan, +

+
+
Text Q
+

+
+
Text R
+

+
+
I look forward to staying in touch. :)
+

+
+
Thank you very much,
+
Name
+

+
+
+

+
On Mon, Oct 3, 2016 at 5:17 PM, Evan Morikawa <evan@nylas.com> wrote:
+
+
Name, +
+

+
+
Text S
+

+
+
Evan
+
Nylas
+ +
+Sent from +Nylas N1, the extensible, open source mail client.
+ +
+
+
+
+
+

+
+--
+
+
+
+
Name Last +
Sc.B., Some Major
+
Some University | Class of 2028
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+--
+
+
+
+
Name Last +
Sc.B., Some Major
+
Some University | Class of 2028
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+--
+
+
+
+
Name Last +
Sc.B., Some Major
+
Some University | Class of 2028
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+--
+
+
+
+
Name Last +
Sc.B., Some Major
+
Some University | Class of 2028
+

+
+
+
+
+
+
+
+
+
+
+
+
+

+
+--
+
+
+
+
Name Last +
Sc.B., Some Major
+
Some University | Class of 2028
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+--
+
+
+
+
Name Last +
Sc.B., Some Major
+
Some University | Class of 2028
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+--
+
+
+
+
Name Last +
Sc.B., Some Major
+
Some University | Class of 2028
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+--
+
+
+
+
Name Last +
Sc.B., Some Major
+
Some University | Class of 2028
+

+
+
+
+
+
+
+
+
+
+
+
+ +



--
Name Last
Sc.B., Some Major
Some University | Class of 2028

+
diff --git a/spec/fixtures/emails/email_23_stripped.html b/spec/fixtures/emails/email_23_stripped.html new file mode 100644 index 000000000..57c454b3e --- /dev/null +++ b/spec/fixtures/emails/email_23_stripped.html @@ -0,0 +1 @@ +
Hi,

Thank you

Text
Text

Thank you again,
Name

\ No newline at end of file diff --git a/spec/services/quoted-html-transformer-spec.coffee b/spec/services/quoted-html-transformer-spec.coffee index 7bc2bd37a..b349f7dea 100644 --- a/spec/services/quoted-html-transformer-spec.coffee +++ b/spec/services/quoted-html-transformer-spec.coffee @@ -19,7 +19,7 @@ describe "QuotedHTMLTransformer", -> re = new RegExp(QuotedHTMLTransformer.annotationClass, 'g') html.match(re)?.length ? 0 - [1..22].forEach (n) -> + [1..23].forEach (n) -> it "properly parses email_#{n}", -> opts = keepIfWholeBodyIsQuote: true expect(removeQuotedHTML("email_#{n}.html", opts).trim()).toEqual(readFile("email_#{n}_stripped.html").trim()) @@ -405,7 +405,7 @@ describe "QuotedHTMLTransformer", -> # `QuotedHTMLTransformer` needs Electron booted up in order to work because # of the DOMParser. xit "Run this simple function to generate output files", -> - [22].forEach (n) -> + [18, 20].forEach (n) -> newHTML = QuotedHTMLTransformer.removeQuotedHTML(readFile("email_#{n}.html")) outPath = path.resolve(__dirname, '..', 'fixtures', 'emails', "email_#{n}_raw_stripped.html") fs.writeFileSync(outPath, newHTML) diff --git a/src/services/quoted-html-transformer.es6 b/src/services/quoted-html-transformer.es6 index a113dbd59..086f62f43 100644 --- a/src/services/quoted-html-transformer.es6 +++ b/src/services/quoted-html-transformer.es6 @@ -164,17 +164,36 @@ class QuotedHTMLTransformer { * message. We detect this case (by looking for signature text * repetition) and add it to the set of flagged quote candidates. */ - quoteElements = quoteElements.concat(unwrappedSignatureDetector(doc, quoteElements)) + const unwrappedSignatureNodes = unwrappedSignatureDetector(doc, quoteElements) + quoteElements = quoteElements.concat(unwrappedSignatureNodes) if (!includeInline && quoteElements.length > 0) { - // This means we only want to remove quoted text that shows up at the - // end of a message. If there were non quoted content after, it'd be - // inline. - const trailingQuotes = this._findTrailingQuotes(doc, Array.from(quoteElements)); // Only keep the trailing quotes so we can delete them. + /** + * The _findTrailingQuotes method will return an array of the quote + * elements we should remove. If there was no trailing text, it + * should include all of the existing VISIBLE quoteElements. If + * there was trailing text, it will only include the quote elements + * up to that trailling text. The intersection below will only + * mark the quote elements below trailing text ot be deleted. + */ quoteElements = _.intersection(quoteElements, trailingQuotes); + + /** + * The _findTraillingQuotes method only preserves VISIBLE elements. + * It's possible that the unwrappedSignatureDetector discovered a + * collection of nodes with both visible and not visible (like br) + * content. If we're going to get rid of trailing signatures we + * need to also remove those trailling
s, or we can get a bunch + * of blank space at the end of the text. First make sure that some + * of our unwrappedSignatureNodes were marked for deletion, and then + * make sure we include all of them. + */ + if (_.intersection(quoteElements, unwrappedSignatureNodes).length > 0) { + quoteElements = _.uniq(quoteElements.concat(unwrappedSignatureNodes)) + } } return _.compact(_.uniq(quoteElements)); @@ -192,6 +211,8 @@ class QuotedHTMLTransformer { * unique text that a user wrote. We return at that point assuming that * everything at the text and above should be visible, even if it's a * quoted text candidate. + * + * See email_18 and email_23 and unwrapped-signature-detector */ _findTrailingQuotes(scopeElement, quoteElements = []) { let trailingQuotes = []; diff --git a/src/services/unwrapped-signature-detector.es6 b/src/services/unwrapped-signature-detector.es6 index 0c1ee8477..d6773daff 100644 --- a/src/services/unwrapped-signature-detector.es6 +++ b/src/services/unwrapped-signature-detector.es6 @@ -23,21 +23,30 @@ function textAndNodesAfterNode(node) { * it looks very similar to someone writing inline regular text after some * quoted text (which is allowed). * - * See email_20 and email_21 as a test case for this. + * See email_18, email_20, email_21, and email_23 test cases for this. */ export default function unwrappedSignatureDetector(doc, quoteElements) { // Find the last quoteBlock for (const node of DOMWalkers.walkBackwards(doc)) { - if (quoteElements.includes(node)) { - const {text, nodes} = textAndNodesAfterNode(node); - const maybeSig = text.trim(); - if (maybeSig.length > 0) { - if ((node.textContent || "").search(Utils.escapeRegExp(maybeSig)) >= 0) { - return nodes; - } - } - break; + let textAndNodes; + let focusNode = node; + if (node && quoteElements.includes(node)) { + textAndNodes = textAndNodesAfterNode(node); + } else if (node.previousSibling && quoteElements.includes(node.previousSibling)) { + focusNode = node.previousSibling; + textAndNodes = textAndNodesAfterNode(node.previousSibling); + } else { + continue; } + + const {text, nodes} = textAndNodes; + const maybeSig = text.replace(/\s/g, ""); + if (maybeSig.length > 0) { + if ((focusNode.textContent || "").replace(/\s/g, "").search(Utils.escapeRegExp(maybeSig)) >= 0) { + return nodes; + } + } + break; } return [] }