[local-sync] Download message batches newest first

Summary:
In most cases (and especially so on Gmail and in the inbox on generic
IMAP), messages with higher UIDs are newer---and even if they aren't the
newest possible messages in other generic IMAP folders, they are the
most recent messages that have been moved to that folder.

Our previous batching strategy unfortunately resulted in us downloading
the lowest UID in each batch first, which was especially confusing when
connecting a new account and having the first message pop up on the
screen be a message from hours or days ago.

This patch changes the batching strategy in three ways:

1. Within a batch, we process downloaded messages from highest UID to
lowest UID.

2. We download batches in order of the ones containing the highest UIDs
first.

3. We group together more UIDs within a single batch by ignoring charset
and transfer-encoding on parts and grouping only by MIME part IDs (which
is the only thing you have to pass to the IMAP FETCH command---no idea
why we included this extraneous part data before, probably just
convenience.)

Example old grouping:

  batch key: '[{"id":"2","transferEncoding":"QUOTED-PRINTABLE","charset":"UTF-8","mimeType":"text/html"}]'
  batch UIDs: [356416,356418,356420,356423,356432,356433,356435,356436,356437,356442,356444]

  batch key: '[{"id":"2","transferEncoding":"QUOTED-PRINTABLE","charset":"Windows-1252","mimeType":"text/html"}]'
  batch UIDs: [353777]

In the new strategy, all of these messages will be downloaded with the
same FETCH command, reducing IMAP round trips before message processing
begins.

Fixes T7770

Test Plan: manual - connect a new account and see that most recent message downloads first

Reviewers: mark, evan, juan

Reviewed By: juan

Maniphest Tasks: T7770

Differential Revision: https://phab.nylas.com/D3838
This commit is contained in:
Christine Spang 2017-02-06 10:28:47 -08:00
parent 651cefb154
commit a22b3a1fc0

View file

@ -220,8 +220,6 @@ class FetchMessagesInFolderIMAP extends SyncTask {
* `Interruptible`
*/
async * _fetchAndProcessMessages({min, max, uids} = {}) {
const uidsByPart = {};
const structsByPart = {};
let rangeQuery;
if (uids) {
if (min || max) {
@ -237,22 +235,35 @@ class FetchMessagesInFolderIMAP extends SyncTask {
// console.log(`FetchMessagesInFolderIMAP: Going to FETCH messages in range ${rangeQuery}`);
// We batch downloads by which MIME parts from the full message we want
// because we can fetch the same part on different UIDs with the same
// FETCH, thus minimizing round trips.
const uidsByPart = {};
const structsByUID = {};
const desiredPartsByUID = {};
yield this._box.fetchEach(rangeQuery, {struct: true}, ({attributes}) => {
const desiredParts = this._getDesiredMIMEParts(attributes.struct);
const key = JSON.stringify(desiredParts);
const key = JSON.stringify(desiredParts.map(p => p.id));
desiredPartsByUID[attributes.uid] = desiredParts;
structsByUID[attributes.uid] = attributes.struct;
uidsByPart[key] = uidsByPart[key] || [];
uidsByPart[key].push(attributes.uid);
structsByPart[key] = attributes.struct;
})
for (const key of Object.keys(uidsByPart)) {
// note: the order of UIDs in the array doesn't matter, Gmail always
// returns them in ascending (oldest => newest) order.
const desiredParts = JSON.parse(key);
// Prioritize the batches with the highest UIDs first, since these UIDs
// are usually the most recent messages
const maxUIDForBatch = {};
const partBatchesInOrder = Object.keys(uidsByPart)
for (const key of partBatchesInOrder) {
maxUIDForBatch[key] = Math.max(...uidsByPart[key]);
}
partBatchesInOrder.sort((a, b) => maxUIDForBatch[b] - maxUIDForBatch[a]);
for (const key of partBatchesInOrder) {
const desiredPartIDs = JSON.parse(key);
// headers are BIG (something like 30% of total storage for an average
// mailbox), so only download the ones we care about
const bodies = ['HEADER.FIELDS (FROM TO SUBJECT DATE CC BCC REPLY-TO IN-REPLY-TO REFERENCES MESSAGE-ID)'].concat(desiredParts.map(p => p.id));
const struct = structsByPart[key];
const bodies = ['HEADER.FIELDS (FROM TO SUBJECT DATE CC BCC REPLY-TO IN-REPLY-TO REFERENCES MESSAGE-ID)'].concat(desiredPartIDs);
const messagesToProcess = []
yield this._box.fetchEach(
@ -260,6 +271,8 @@ class FetchMessagesInFolderIMAP extends SyncTask {
{bodies},
(imapMessage) => messagesToProcess.push(imapMessage)
);
// generally higher UIDs are newer, so process those first
messagesToProcess.sort((a, b) => b.attributes.uid - a.attributes.uid);
// Processing messages is not fire and forget.
// We need to wait for all of the messages in the range to be processed
@ -268,11 +281,12 @@ class FetchMessagesInFolderIMAP extends SyncTask {
// queue to disk in case you quit the app and there are still messages
// left in the queue. Otherwise we would end up skipping messages.
for (const imapMessage of messagesToProcess) {
const uid = imapMessage.attributes.uid;
// This will resolve when the message is actually processed
await MessageProcessor.queueMessageForProcessing({
imapMessage,
struct,
desiredParts,
struct: structsByUID[uid],
desiredParts: desiredPartsByUID[uid],
folderId: this._folder.id,
accountId: this._db.accountId,
})