[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
/* eslint no-useless-escape: 0 */
|
2016-11-30 03:18:51 +08:00
|
|
|
const mimelib = require('mimelib');
|
2016-12-09 10:30:57 +08:00
|
|
|
const encoding = require('encoding');
|
2016-12-16 01:17:45 +08:00
|
|
|
const he = require('he');
|
2016-12-16 02:18:41 +08:00
|
|
|
const os = require('os');
|
|
|
|
const fs = require('fs');
|
2016-12-17 06:39:50 +08:00
|
|
|
const path = require('path');
|
2016-12-16 02:18:41 +08:00
|
|
|
const mkdirp = require('mkdirp');
|
2016-12-30 02:35:24 +08:00
|
|
|
const {Errors: {APIError}} = require('isomorphic-core');
|
2016-12-24 08:15:34 +08:00
|
|
|
const {N1CloudAPI, RegExpUtils, Utils} = require('nylas-exports');
|
2016-11-30 03:18:51 +08:00
|
|
|
|
2016-12-17 06:39:50 +08:00
|
|
|
// Aiming for the former in length, but the latter is the hard db cutoff
|
2016-12-09 10:30:57 +08:00
|
|
|
const SNIPPET_SIZE = 100;
|
|
|
|
const SNIPPET_MAX_SIZE = 255;
|
2016-11-30 03:18:51 +08:00
|
|
|
|
2016-12-17 06:39:50 +08:00
|
|
|
|
2017-01-11 04:04:58 +08:00
|
|
|
// Format of input: ['a@example.com, B <b@example.com>', 'c@example.com'],
|
|
|
|
// where each element of the array is the unparsed contents of a single
|
|
|
|
// element of the same header field. (It's totally valid to have multiple
|
|
|
|
// From/To/etc. headers on the same email.)
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
function parseContacts(input) {
|
2016-12-13 05:33:34 +08:00
|
|
|
if (!input || input.length === 0 || !input[0]) {
|
|
|
|
return [];
|
|
|
|
}
|
2017-01-11 04:04:58 +08:00
|
|
|
let contacts = [];
|
|
|
|
for (const headerLine of input) {
|
|
|
|
const values = mimelib.parseAddresses(headerLine);
|
|
|
|
if (!values || values.length === 0) {
|
|
|
|
continue;
|
2016-12-15 17:15:51 +08:00
|
|
|
}
|
2017-01-11 04:04:58 +08:00
|
|
|
contacts = contacts.concat(values.map(v => {
|
|
|
|
if (!v || v.length === 0) {
|
|
|
|
return null
|
|
|
|
}
|
|
|
|
const {name, address: email} = v;
|
|
|
|
return {name, email};
|
|
|
|
})
|
|
|
|
.filter(c => c != null))
|
|
|
|
}
|
|
|
|
return contacts;
|
2016-11-30 03:18:51 +08:00
|
|
|
}
|
|
|
|
|
2016-12-17 06:39:50 +08:00
|
|
|
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
function parseSnippet(body) {
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
const doc = new DOMParser().parseFromString(body, 'text/html')
|
|
|
|
const skipTags = new Set(['TITLE', 'SCRIPT', 'STYLE', 'IMG']);
|
|
|
|
const noSpaceTags = new Set(['B', 'I', 'STRONG', 'EM', 'SPAN']);
|
2016-12-14 04:42:38 +08:00
|
|
|
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
const treeWalker = document.createTreeWalker(doc, NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_TEXT, (node) => {
|
|
|
|
if (skipTags.has(node.tagName)) {
|
|
|
|
// skip this node and all its children
|
|
|
|
return NodeFilter.FILTER_REJECT;
|
|
|
|
}
|
|
|
|
if (node.nodeType === Node.TEXT_NODE) {
|
|
|
|
const nodeValue = node.nodeValue ? node.nodeValue.trim() : null;
|
|
|
|
if (nodeValue) {
|
|
|
|
return NodeFilter.FILTER_ACCEPT;
|
2016-12-21 07:10:22 +08:00
|
|
|
}
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
return NodeFilter.FILTER_SKIP;
|
2016-12-14 04:42:38 +08:00
|
|
|
}
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
return NodeFilter.FILTER_ACCEPT;
|
|
|
|
});
|
2016-12-21 07:10:22 +08:00
|
|
|
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
let extractedText = "";
|
|
|
|
let lastNodeTag = "";
|
|
|
|
while (treeWalker.nextNode()) {
|
|
|
|
if (treeWalker.currentNode.nodeType === Node.ELEMENT_NODE) {
|
|
|
|
lastNodeTag = treeWalker.currentNode.nodeName;
|
|
|
|
} else {
|
|
|
|
if (extractedText && !noSpaceTags.has(lastNodeTag)) {
|
|
|
|
extractedText += " ";
|
|
|
|
}
|
|
|
|
extractedText += treeWalker.currentNode.nodeValue;
|
|
|
|
if (extractedText.length > SNIPPET_MAX_SIZE) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
2016-12-14 04:42:38 +08:00
|
|
|
}
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
const snippetText = extractedText.trim();
|
2016-12-14 04:42:38 +08:00
|
|
|
|
|
|
|
// clean up and trim snippet
|
2016-12-21 07:10:22 +08:00
|
|
|
let trimmed = snippetText.replace(/[\n\r]/g, ' ').replace(/\s\s+/g, ' ').substr(0, SNIPPET_MAX_SIZE);
|
2016-12-14 04:42:38 +08:00
|
|
|
if (trimmed) {
|
2016-12-17 06:39:50 +08:00
|
|
|
// TODO: strip quoted text from snippets also
|
2016-12-14 04:42:38 +08:00
|
|
|
// trim down to approx. SNIPPET_SIZE w/out cutting off words right in the
|
|
|
|
// middle (if possible)
|
|
|
|
const wordBreak = trimmed.indexOf(' ', SNIPPET_SIZE);
|
|
|
|
if (wordBreak !== -1) {
|
|
|
|
trimmed = trimmed.substr(0, wordBreak);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return trimmed;
|
|
|
|
}
|
|
|
|
|
2017-01-14 23:00:25 +08:00
|
|
|
// In goes arrays of text, out comes arrays of RFC2822 Message-Ids. Luckily,
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
// these days most all text in In-Reply-To, Message-Id, and References headers
|
|
|
|
// actually conforms to the spec.
|
|
|
|
function parseReferences(input) {
|
|
|
|
if (!input || !input.length || !input[0]) {
|
|
|
|
return [];
|
|
|
|
}
|
|
|
|
const references = new Set();
|
|
|
|
for (const headerLine of input) {
|
|
|
|
for (const ref of headerLine.split(/\s+/)) {
|
|
|
|
if (/^<.*>$/.test(ref)) {
|
|
|
|
references.add(ref);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return Array.from(references);
|
|
|
|
}
|
2016-12-17 06:39:50 +08:00
|
|
|
|
|
|
|
function htmlifyPlaintext(text) {
|
2016-12-16 01:17:45 +08:00
|
|
|
const escapedText = he.escape(text);
|
|
|
|
return `<pre class="nylas-plaintext">${escapedText}</pre>`;
|
2016-12-14 04:42:38 +08:00
|
|
|
}
|
|
|
|
|
2016-12-17 06:39:50 +08:00
|
|
|
|
|
|
|
function replaceMessageIdInBodyTrackingLinks(messageId, originalBody) {
|
|
|
|
const regex = new RegExp(`(${N1CloudAPI.APIRoot}.+?)MESSAGE_ID`, 'g')
|
|
|
|
return originalBody.replace(regex, `$1${messageId}`)
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
function stripTrackingLinksFromBody(originalBody) {
|
|
|
|
let body = originalBody.replace(/<img class="n1-open"[^<]+src="([a-zA-Z0-9-_:/.]*)">/g, () => {
|
|
|
|
return "";
|
|
|
|
});
|
|
|
|
body = body.replace(RegExpUtils.urlLinkTagRegex(), (match, prefix, url, suffix, content, closingTag) => {
|
|
|
|
const param = url.split("?")[1];
|
|
|
|
if (param) {
|
|
|
|
const link = decodeURIComponent(param.split("=")[1]);
|
|
|
|
return `${prefix}${link}${suffix}${content}${closingTag}`;
|
|
|
|
}
|
|
|
|
return match;
|
|
|
|
});
|
|
|
|
return body;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
function buildTrackingBodyForRecipient({baseMessage, recipient, usesOpenTracking, usesLinkTracking} = {}) {
|
|
|
|
const {id: messageId, body} = baseMessage
|
|
|
|
const encodedEmail = btoa(recipient.email)
|
|
|
|
.replace(/\+/g, '-')
|
|
|
|
.replace(/\//g, '_');
|
|
|
|
let customBody = body
|
|
|
|
if (usesOpenTracking) {
|
|
|
|
customBody = customBody.replace(/<img class="n1-open"[^<]+src="([a-zA-Z0-9-_:/.]*)">/g, (match, url) => {
|
|
|
|
return `<img class="n1-open" width="0" height="0" style="border:0; width:0; height:0;" src="${url}?r=${encodedEmail}">`;
|
|
|
|
});
|
|
|
|
}
|
|
|
|
if (usesLinkTracking) {
|
|
|
|
customBody = customBody.replace(RegExpUtils.urlLinkTagRegex(), (match, prefix, url, suffix, content, closingTag) => {
|
|
|
|
return `${prefix}${url}&r=${encodedEmail}${suffix}${content}${closingTag}`;
|
|
|
|
});
|
|
|
|
}
|
|
|
|
return replaceMessageIdInBodyTrackingLinks(messageId, customBody);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
function getReplyHeaders(messageReplyingTo) {
|
|
|
|
let inReplyTo;
|
|
|
|
let references;
|
|
|
|
if (messageReplyingTo.headerMessageId) {
|
|
|
|
inReplyTo = messageReplyingTo.headerMessageId;
|
|
|
|
if (messageReplyingTo.references) {
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
const refById = {};
|
|
|
|
for (const ref of messageReplyingTo.references) {
|
|
|
|
refById[ref.id] = ref;
|
|
|
|
}
|
|
|
|
references = [];
|
|
|
|
for (const referenceId of messageReplyingTo.referencesOrder) {
|
|
|
|
references.push(refById[referenceId].rfc2822MessageId);
|
|
|
|
}
|
|
|
|
if (!references.includes(messageReplyingTo.headerMessageId)) {
|
|
|
|
references.push(messageReplyingTo.headerMessageId);
|
|
|
|
}
|
2016-12-17 06:39:50 +08:00
|
|
|
} else {
|
|
|
|
references = [messageReplyingTo.headerMessageId];
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return {inReplyTo, references}
|
|
|
|
}
|
|
|
|
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
function bodyFromParts(imapMessage, desiredParts) {
|
|
|
|
let body = '';
|
2017-01-06 00:44:20 +08:00
|
|
|
for (const {id, mimeType, transferEncoding, charset} of desiredParts) {
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
let decoded = '';
|
2016-12-09 10:30:57 +08:00
|
|
|
// see https://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html
|
2017-01-21 05:54:26 +08:00
|
|
|
if ((/quot(ed)?[-/]print(ed|able)?/gi).test(transferEncoding)) {
|
|
|
|
decoded = mimelib.decodeQuotedPrintable(imapMessage.parts[id], charset);
|
|
|
|
} else if ((/base64/gi).test(transferEncoding)) {
|
|
|
|
decoded = mimelib.decodeBase64(imapMessage.parts[id], charset);
|
|
|
|
} else {
|
|
|
|
// Treat this as having no encoding and decode based only on the charset
|
2017-01-16 07:12:18 +08:00
|
|
|
//
|
|
|
|
// According to https://tools.ietf.org/html/rfc2045#section-5.2,
|
|
|
|
// this should default to ascii; however, if we don't get a charset,
|
|
|
|
// it's possible clients (like nodemailer) encoded the data as utf-8
|
|
|
|
// anyway. Since ascii is a strict subset of utf-8, it's safer to
|
|
|
|
// try and decode as utf-8 if we don't have the charset.
|
|
|
|
//
|
2017-01-21 05:54:26 +08:00
|
|
|
// (This applies to decoding quoted-printable and base64 as well. The
|
|
|
|
// mimelib library, if charset is null, will default to utf-8)
|
|
|
|
//
|
2017-01-16 07:12:18 +08:00
|
|
|
decoded = encoding.convert(imapMessage.parts[id], 'utf-8', charset).toString('utf-8');
|
2016-11-30 03:18:51 +08:00
|
|
|
}
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
// desiredParts are in order of the MIME tree walk, e.g. 1.1, 1.2, 2...,
|
|
|
|
// and for multipart/alternative arrays, we have already pulled out the
|
|
|
|
// highest fidelity part (generally HTML).
|
|
|
|
//
|
|
|
|
// Therefore, the correct way to display multiple parts is to simply
|
|
|
|
// concatenate later ones with the body of the previous MIME parts.
|
|
|
|
//
|
|
|
|
// This may seem kind of weird, but some MUAs _do_ send out whack stuff
|
|
|
|
// like an HTML body followed by a plaintext footer.
|
|
|
|
if (mimeType === 'text/plain') {
|
|
|
|
body += htmlifyPlaintext(decoded);
|
|
|
|
} else {
|
|
|
|
body += decoded;
|
|
|
|
}
|
2016-11-30 03:18:51 +08:00
|
|
|
}
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
// sometimes decoding results in a NUL-terminated body string, which makes
|
|
|
|
// SQLite blow up with an 'unrecognized token' error
|
|
|
|
body = body.replace(/\0/g, '');
|
|
|
|
|
|
|
|
return body;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Since we only fetch the MIME structure and specific desired MIME parts from
|
|
|
|
// IMAP, we unfortunately can't use an existing library like mailparser to parse
|
|
|
|
// the message, and have to do fun stuff like deal with character sets and
|
|
|
|
// content-transfer-encodings ourselves.
|
|
|
|
async function parseFromImap(imapMessage, desiredParts, {db, accountId, folder}) {
|
|
|
|
const {Message, Label} = db;
|
|
|
|
const {attributes} = imapMessage;
|
|
|
|
|
2016-12-23 07:42:55 +08:00
|
|
|
// this key name can change depending on which subset of headers we're downloading,
|
|
|
|
// so to prevent having to update this code every time we change the set,
|
|
|
|
// dynamically look up the key instead
|
|
|
|
const headerKey = Object.keys(imapMessage.parts).filter(k => k.startsWith('HEADER'))[0]
|
|
|
|
const headers = imapMessage.parts[headerKey].toString('ascii')
|
2016-12-23 00:51:16 +08:00
|
|
|
const parsedHeaders = mimelib.parseHeaders(headers);
|
2016-11-30 03:18:51 +08:00
|
|
|
for (const key of ['x-gm-thrid', 'x-gm-msgid', 'x-gm-labels']) {
|
|
|
|
parsedHeaders[key] = attributes[key];
|
|
|
|
}
|
|
|
|
|
2016-12-09 10:30:57 +08:00
|
|
|
const parsedMessage = {
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
to: parseContacts(parsedHeaders.to),
|
|
|
|
cc: parseContacts(parsedHeaders.cc),
|
|
|
|
bcc: parseContacts(parsedHeaders.bcc),
|
|
|
|
from: parseContacts(parsedHeaders.from),
|
|
|
|
replyTo: parseContacts(parsedHeaders['reply-to']),
|
2016-11-30 03:18:51 +08:00
|
|
|
accountId: accountId,
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
body: bodyFromParts(imapMessage, desiredParts),
|
2016-12-09 10:30:57 +08:00
|
|
|
snippet: null,
|
2016-11-30 03:18:51 +08:00
|
|
|
unread: !attributes.flags.includes('\\Seen'),
|
|
|
|
starred: attributes.flags.includes('\\Flagged'),
|
2017-01-11 04:04:58 +08:00
|
|
|
// We prefer the date from the message headers because the date is one of
|
|
|
|
// the fields we use for generating unique message IDs, and the server
|
|
|
|
// INTERNALDATE, `attributes.date`, may differ across accounts for the same
|
|
|
|
// message. If the Date header is not included in the message, we fall
|
|
|
|
// back to the INTERNALDATE and it's possible we'll generate different IDs
|
|
|
|
// for the same message delivered to different accounts (which is better
|
|
|
|
// than having message ID collisions for different messages, which could
|
|
|
|
// happen if we did not include the date).
|
2016-12-24 02:03:56 +08:00
|
|
|
date: parsedHeaders.date ? parsedHeaders.date[0] : imapMessage.attributes.date,
|
2016-11-30 03:18:51 +08:00
|
|
|
folderImapUID: attributes.uid,
|
2016-12-06 04:16:53 +08:00
|
|
|
folderId: folder.id,
|
2016-11-30 03:18:51 +08:00
|
|
|
folder: null,
|
|
|
|
labels: [],
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
headerMessageId: parseReferences(parsedHeaders['message-id'])[0],
|
|
|
|
// References are not saved on the message model itself, but are later
|
|
|
|
// converted to associated Reference objects so we can index them. Since we
|
|
|
|
// don't do tree threading, we don't need to care about In-Reply-To
|
|
|
|
// separately, and can simply associate them all in the same way.
|
|
|
|
// Generally, References already contains the Message-IDs in In-Reply-To,
|
|
|
|
// but we concat and dedupe just in case.
|
|
|
|
references: parseReferences((parsedHeaders.references || []).concat(
|
|
|
|
(parsedHeaders['in-reply-to'] || []), (parsedHeaders['message-id'] || []))),
|
2016-12-08 02:22:31 +08:00
|
|
|
gMsgId: parsedHeaders['x-gm-msgid'],
|
2016-12-23 07:42:55 +08:00
|
|
|
gThrId: parsedHeaders['x-gm-thrid'],
|
2016-12-17 02:42:28 +08:00
|
|
|
subject: parsedHeaders.subject ? parsedHeaders.subject[0] : '(no subject)',
|
2016-11-30 03:18:51 +08:00
|
|
|
}
|
2016-12-24 08:15:34 +08:00
|
|
|
// Inversely to `buildForSend`, we leave the date header as it is so that the
|
|
|
|
// format is consistent for the generative IDs, then convert it to a Date object
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
parsedMessage.id = Message.hash(parsedMessage)
|
2016-12-24 08:15:34 +08:00
|
|
|
parsedMessage.date = new Date(Date.parse(parsedMessage.date))
|
2016-11-30 03:18:51 +08:00
|
|
|
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
parsedMessage.snippet = parseSnippet(parsedMessage.body);
|
|
|
|
|
[local-sync] Correctly handle messages with non-alternative multipart text bodies
Summary:
It's possible to have multiple inline HTML parts in a message, or even
a multipart/alternative part that contains text and HTML, followed by a
plaintext signature. Previously, if there was more than one text part in
an email, we would pick the _last_ text/html or text/plain part that we
found, and treat that as the entire message body. This works most of the
time, but fails to display the full message body in some edge cases.
This patch fixes that by resolving multipart/alternative subparts to a
single part in the mimepart fetch stage, and then treating each desired
mime part separately when parsing the message, concatenating them if
there are multiple.
This makes K2's handling of multipart MIME message text better,
bug-wise, than the Python sync engine's, which has been mangling some
rare messages forever. (Example from my email: every email from the MIT
EECS Jobs List has never displayed the mailing list signature in N1.)
Note that this patch also removes our tentative support for PGP
encrypted messages. I'd rather add that back in later when I've dug up
some real example messages to test on, rather than leaving it in in its
current not-really-tested and probably not-really-working state, since
it makes it harder to make sure that the rest of the logic isn't broken.
Test Plan: manual for now - added examples of this to my growing list of regression tests to add to the message parser unit tests once I fix them
Reviewers: juan, evan
Reviewed By: evan
Differential Revision: https://phab.nylas.com/D3600
2017-01-06 09:18:18 +08:00
|
|
|
parsedMessage.folder = folder;
|
2016-12-09 10:30:57 +08:00
|
|
|
|
|
|
|
const xGmLabels = attributes['x-gm-labels']
|
2016-12-06 04:16:53 +08:00
|
|
|
if (xGmLabels) {
|
2016-12-09 10:30:57 +08:00
|
|
|
parsedMessage.folderImapXGMLabels = JSON.stringify(xGmLabels)
|
|
|
|
parsedMessage.labels = await Label.findXGMLabels(xGmLabels)
|
2016-12-06 04:16:53 +08:00
|
|
|
}
|
2016-11-30 03:18:51 +08:00
|
|
|
|
2016-12-16 02:18:41 +08:00
|
|
|
if (process.env.NYLAS_DEBUG) {
|
|
|
|
const outJSON = JSON.stringify({imapMessage, desiredParts, result: parsedMessage});
|
|
|
|
const outDir = path.join(os.tmpdir(), "k2-parse-output", folder.name)
|
|
|
|
const outFile = path.join(outDir, imapMessage.attributes.uid.toString());
|
|
|
|
mkdirp.sync(outDir);
|
|
|
|
fs.writeFileSync(outFile, outJSON);
|
|
|
|
}
|
|
|
|
|
2016-12-09 10:30:57 +08:00
|
|
|
return parsedMessage;
|
2016-11-30 03:18:51 +08:00
|
|
|
}
|
|
|
|
|
2016-12-09 09:48:34 +08:00
|
|
|
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
async function buildForSend(db, json) {
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
const {Thread, Message, Reference} = db
|
2016-12-09 09:48:34 +08:00
|
|
|
let replyToThread;
|
|
|
|
let replyToMessage;
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
|
|
|
|
if (json.thread_id != null) {
|
2016-12-09 09:48:34 +08:00
|
|
|
replyToThread = await Thread.find({
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
where: {id: json.thread_id},
|
2016-12-09 09:48:34 +08:00
|
|
|
include: [{
|
|
|
|
model: Message,
|
|
|
|
as: 'messages',
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
attributes: ['id'],
|
2016-12-09 09:48:34 +08:00
|
|
|
}],
|
|
|
|
});
|
|
|
|
}
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
|
|
|
|
if (json.reply_to_message_id != null) {
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
replyToMessage = await Message.findById(json.reply_to_message_id,
|
|
|
|
{ include: [{model: Reference, as: 'references', attributes: ['id', 'rfc2822MessageId']}] });
|
2016-12-09 09:48:34 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (replyToThread && replyToMessage) {
|
|
|
|
if (!replyToThread.messages.find((msg) => msg.id === replyToMessage.id)) {
|
2016-12-15 15:46:36 +08:00
|
|
|
throw new APIError(`Message ${replyToMessage.id} is not in thread ${replyToThread.id}`, 400)
|
2016-12-09 09:48:34 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
let thread;
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
let replyHeaders = {};
|
2017-01-17 11:41:43 +08:00
|
|
|
let inReplyToLocalMessageId;
|
2016-12-09 09:48:34 +08:00
|
|
|
if (replyToMessage) {
|
2017-01-17 11:41:43 +08:00
|
|
|
inReplyToLocalMessageId = replyToMessage.id;
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
replyHeaders = getReplyHeaders(replyToMessage);
|
|
|
|
thread = await replyToMessage.getThread();
|
2016-12-09 09:48:34 +08:00
|
|
|
} else if (replyToThread) {
|
|
|
|
thread = replyToThread;
|
|
|
|
const previousMessages = thread.messages.filter(msg => !msg.isDraft);
|
|
|
|
if (previousMessages.length > 0) {
|
|
|
|
const lastMessage = previousMessages[previousMessages.length - 1]
|
2017-01-17 11:41:43 +08:00
|
|
|
inReplyToLocalMessageId = lastMessage.id;
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
replyHeaders = getReplyHeaders(lastMessage);
|
2016-12-09 09:48:34 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
const {inReplyTo, references} = replyHeaders
|
2016-12-24 08:15:34 +08:00
|
|
|
const date = new Date()
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
const message = {
|
|
|
|
accountId: json.account_id,
|
|
|
|
threadId: thread ? thread.id : null,
|
|
|
|
headerMessageId: Message.buildHeaderMessageId(json.client_id),
|
|
|
|
from: json.from,
|
|
|
|
to: json.to,
|
|
|
|
cc: json.cc,
|
|
|
|
bcc: json.bcc,
|
|
|
|
replyTo: json.reply_to,
|
|
|
|
subject: json.subject,
|
|
|
|
body: json.body,
|
2017-01-13 23:46:10 +08:00
|
|
|
unread: false,
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
isDraft: json.draft,
|
|
|
|
isSent: false,
|
|
|
|
version: 0,
|
2016-12-24 08:15:34 +08:00
|
|
|
date: date,
|
2017-01-17 11:41:43 +08:00
|
|
|
inReplyToLocalMessageId: inReplyToLocalMessageId,
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
uploads: json.uploads,
|
|
|
|
}
|
2016-12-24 08:15:34 +08:00
|
|
|
// We have to clone the message and change the date for hashing because the
|
|
|
|
// date we get later when we parse from IMAP is a different format, per the
|
|
|
|
// nodemailer buildmail function that gives us the raw message and replaces
|
|
|
|
// the date header with this modified UTC string
|
|
|
|
// https://github.com/nodemailer/buildmail/blob/master/lib/buildmail.js#L470
|
|
|
|
const messageForHashing = Utils.deepClone(message)
|
|
|
|
messageForHashing.date = date.toUTCString().replace(/GMT/, '+0000')
|
|
|
|
message.id = Message.hash(messageForHashing)
|
2016-12-17 06:39:50 +08:00
|
|
|
message.body = replaceMessageIdInBodyTrackingLinks(message.id, message.body)
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
const instance = Message.build(message)
|
|
|
|
// we don't store these fields in the db, so if we set them on `message`
|
|
|
|
// beforehand Message.build() would drop them---this is just a convenience
|
|
|
|
instance.inReplyTo = inReplyTo;
|
|
|
|
instance.references = references;
|
|
|
|
return instance;
|
2016-12-09 09:48:34 +08:00
|
|
|
}
|
|
|
|
|
2016-11-30 03:18:51 +08:00
|
|
|
module.exports = {
|
[local-sync, cloud-api, cloud-workers] Fix msg id collision, tracking and sending issues, some refactoring
Summary:
This diff solves a few separate issues from T7313, T7316, T7282, and it refactors
the send code a little bit.
Initially, the problem that led to this diff was generating message ids that
wouldn't collide (which was causing errors in the message-processor). Collisions
in ids were being caused by messages that contained the exact same participants,
subject and date (most likely due bots or scripts sending emails in quick
succession)
To prevent collisions this commit adds the `message-id` header as part of the
database message id, and ensures that we set it correctly before sending, and
that it remains consistent through send, multi-send, and the sync loop.
During the refactor and review, I removed some code that assumed that we were
syncing drafts (which we aren't), and also fixes a few other known and
unknown issues around sending, message creation, and tracking, like assigning
the correct date header (we were previously assigning the draft creation date
from within N1), fixing the tracking regex, among other smaller bugs/typos.
Will address inline TODOs in a separate diff
Test Plan: TODO!!! I will add tests in another diff
Reviewers: evan, halla, jackie, khamidou
Reviewed By: halla, jackie
Differential Revision: https://phab.nylas.com/D3507
2016-12-15 11:35:19 +08:00
|
|
|
buildForSend,
|
2017-01-17 11:41:43 +08:00
|
|
|
getReplyHeaders,
|
2016-11-30 03:18:51 +08:00
|
|
|
parseFromImap,
|
[local-sync] For generic IMAP, Thread based on Message-Id, In-Reply-To & References
Summary:
This swaps out our generic IMAP threading mechanism to use the threading
headers on the message instead of the prior way of grouping by subject
and then differentiating based on participants, as that design was
somewhat driven by what we could accomplish easily given legacy data
schema decisions and has serious caveats, such as different threads between
the same people with the same subject being misthreaded together. With K2, we
have free reign to change the data format, so we can do it right.
The algorithm is super simple:
- Define "references" as the union of the Message-Id, In-Reply-To, and
References headers on a message, filtered for valid RFC2822 Message-IDs
- On message sync, if any element of the new message's references
matches any element of an existing message's references, thread them
together
In order to accomplish this, we need to store References in a way that
allows each element to be indexed for fast lookup. That meant either
using the sqlite JSON1 extension + expression-based indices, or creating
a new table. I chose the latter as a time-tested and simple solution,
since we don't need the flexibility of JSON here.
Test Plan: manual - unit tests coming
Reviewers: khamidou, evan, juan
Reviewed By: evan, juan
Differential Revision: https://phab.nylas.com/D3651
2017-01-08 06:31:28 +08:00
|
|
|
parseSnippet,
|
|
|
|
parseContacts,
|
2016-12-17 06:39:50 +08:00
|
|
|
stripTrackingLinksFromBody,
|
|
|
|
buildTrackingBodyForRecipient,
|
|
|
|
replaceMessageIdInBodyTrackingLinks,
|
2016-11-30 03:18:51 +08:00
|
|
|
}
|