[local-sync] Download message batches newest first

Summary: In most cases (and especially so on Gmail and in the inbox on generic IMAP), messages with higher UIDs are newer---and even if they aren't the newest possible messages in other generic IMAP folders, they are the most recent messages that have been moved to that folder. Our previous batching strategy unfortunately resulted in us downloading the lowest UID in each batch first, which was especially confusing when connecting a new account and having the first message pop up on the screen be a message from hours or days ago. This patch changes the batching strategy in three ways: 1. Within a batch, we process downloaded messages from highest UID to lowest UID. 2. We download batches in order of the ones containing the highest UIDs first. 3. We group together more UIDs within a single batch by ignoring charset and transfer-encoding on parts and grouping only by MIME part IDs (which is the only thing you have to pass to the IMAP FETCH command---no idea why we included this extraneous part data before, probably just convenience.) Example old grouping: batch key: '[{"id":"2","transferEncoding":"QUOTED-PRINTABLE","charset":"UTF-8","mimeType":"text/html"}]' batch UIDs: [356416,356418,356420,356423,356432,356433,356435,356436,356437,356442,356444] batch key: '[{"id":"2","transferEncoding":"QUOTED-PRINTABLE","charset":"Windows-1252","mimeType":"text/html"}]' batch UIDs: [353777] In the new strategy, all of these messages will be downloaded with the same FETCH command, reducing IMAP round trips before message processing begins. Fixes T7770 Test Plan: manual - connect a new account and see that most recent message downloads first Reviewers: mark, evan, juan Reviewed By: juan Maniphest Tasks: T7770 Differential Revision: https://phab.nylas.com/D3838
2024-11-11 10:12:00 +08:00 · 2017-02-06 10:28:47 -08:00 · 2017-02-06 10:28:47 -08:00 · a22b3a1fc0
commit a22b3a1fc0
parent 651cefb154
1 changed files with 26 additions and 12 deletions
--- a/packages/local-sync/src/local-sync-worker/sync-tasks/fetch-messages-in-folder.imap.es6
+++ b/packages/local-sync/src/local-sync-worker/sync-tasks/fetch-messages-in-folder.imap.es6
@ -220,8 +220,6 @@ class FetchMessagesInFolderIMAP extends SyncTask {
   * `Interruptible`
   */
  async * _fetchAndProcessMessages({min, max, uids} = {}) {
-    const uidsByPart = {};
-    const structsByPart = {};
    let rangeQuery;
    if (uids) {
      if (min || max) {
@ -237,22 +235,35 @@ class FetchMessagesInFolderIMAP extends SyncTask {

    // console.log(`FetchMessagesInFolderIMAP: Going to FETCH messages in range ${rangeQuery}`);

+    // We batch downloads by which MIME parts from the full message we want
+    // because we can fetch the same part on different UIDs with the same
+    // FETCH, thus minimizing round trips.
+    const uidsByPart = {};
+    const structsByUID = {};
+    const desiredPartsByUID = {};
    yield this._box.fetchEach(rangeQuery, {struct: true}, ({attributes}) => {
      const desiredParts = this._getDesiredMIMEParts(attributes.struct);
-      const key = JSON.stringify(desiredParts);
+      const key = JSON.stringify(desiredParts.map(p => p.id));
+      desiredPartsByUID[attributes.uid] = desiredParts;
+      structsByUID[attributes.uid] = attributes.struct;
      uidsByPart[key] = uidsByPart[key] || [];
      uidsByPart[key].push(attributes.uid);
-      structsByPart[key] = attributes.struct;
    })

-    for (const key of Object.keys(uidsByPart)) {
-      // note: the order of UIDs in the array doesn't matter, Gmail always
-      // returns them in ascending (oldest => newest) order.
-      const desiredParts = JSON.parse(key);
+    // Prioritize the batches with the highest UIDs first, since these UIDs
+    // are usually the most recent messages
+    const maxUIDForBatch = {};
+    const partBatchesInOrder = Object.keys(uidsByPart)
+    for (const key of partBatchesInOrder) {
+      maxUIDForBatch[key] = Math.max(...uidsByPart[key]);
+    }
+    partBatchesInOrder.sort((a, b) => maxUIDForBatch[b] - maxUIDForBatch[a]);
+
+    for (const key of partBatchesInOrder) {
+      const desiredPartIDs = JSON.parse(key);
      // headers are BIG (something like 30% of total storage for an average
      // mailbox), so only download the ones we care about
-      const bodies = ['HEADER.FIELDS (FROM TO SUBJECT DATE CC BCC REPLY-TO IN-REPLY-TO REFERENCES MESSAGE-ID)'].concat(desiredParts.map(p => p.id));
-      const struct = structsByPart[key];
+      const bodies = ['HEADER.FIELDS (FROM TO SUBJECT DATE CC BCC REPLY-TO IN-REPLY-TO REFERENCES MESSAGE-ID)'].concat(desiredPartIDs);

      const messagesToProcess = []
      yield this._box.fetchEach(
@ -260,6 +271,8 @@ class FetchMessagesInFolderIMAP extends SyncTask {
        {bodies},
        (imapMessage) => messagesToProcess.push(imapMessage)
      );
+      // generally higher UIDs are newer, so process those first
+      messagesToProcess.sort((a, b) => b.attributes.uid - a.attributes.uid);

      // Processing messages is not fire and forget.
      // We need to wait for all of the messages in the range to be processed
@ -268,11 +281,12 @@ class FetchMessagesInFolderIMAP extends SyncTask {
      // queue to disk in case you quit the app and there are still messages
      // left in the queue. Otherwise we would end up skipping messages.
      for (const imapMessage of messagesToProcess) {
+        const uid = imapMessage.attributes.uid;
        // This will resolve when the message is actually processed
        await MessageProcessor.queueMessageForProcessing({
          imapMessage,
-          struct,
-          desiredParts,
+          struct: structsByUID[uid],
+          desiredParts: desiredPartsByUID[uid],
          folderId: this._folder.id,
          accountId: this._db.accountId,
        })