Phil Booth

Existing by coincidence, programming deliberately

Back-off and retry using JavaScript arrays and promises

JavaScript's Array and Promise types compose nicely with functional programming idioms to control concurrency without recourse to 3rd-party libraries. This post contrasts two such patterns that enable you to process data either concurrently or serially, with back-off and retry logic in the event of errors occurring.

Concurrent execution

Concurrent execution is the simple case, used when you don't have to worry about factors such as rate-limits or ordering.

Let's say you have an array of data that you want to send to a back-end metrics system. This system is under your control, and you know you won't be rate-limited:

const request = require('request-promise');

const { METRICS_API_KEY } = process.env;
const METRICS_ENDPOINT = 'https://example.com/metrics';

function send (data) {
  return Promise.all(
    data.map(batch => request(METRICS_ENDPOINT, {
      method: 'POST',
      formData: {
        api_key: METRICS_API_KEY,
        data: JSON.stringify(batch),
      },
    }).promise())
  );
}

In this case, you can just bang through the data as quickly as possible using Array.map and pay no heed to how that translates into network usage.

If you want to add retry logic, it's straightforward to insert some by pulling the map operation out to a named function and calling it recursively in the error case:

const request = require('request-promise');

const { METRICS_API_KEY } = process.env;
const METRICS_ENDPOINT = 'https://example.com/metrics';
const RETRY_LIMIT = 3;

function send (data) {
  return Promise.all(data.map(batch => sendBatch(batch)));
}

async function sendBatch (batch, iteration = 0) {
  try {
    return await request(METRICS_ENDPOINT, {
      simple: true,
      method: 'POST',
      formData: {
        api_key: METRICS_API_KEY,
        data: JSON.stringify(batch),
      },
    }).promise();
  } catch (error) {
    if (iteration === RETRY_LIMIT) {
      return error;
    }

    return sendBatch(batch, iteration + 1);
  }
}

It's worth calling out the termination condition here. All recursive functions need some kind of termination condition to prevent them spinning off into infinity. The Little Schemer, which is possibly the best book there is about recursion, suggests the termination condition should usually be the first item in any recursive function. I've broken that convention here because the condition returns the error response, which it wouldn't have access to in the other position. But at least it's the first item in the catch block and hopefully stands out clearly enough to anyone scanning the code.

It's also worth calling out that JavaScript's native promises leak memory in the presence of recursion. That won't be an issue if your recursion is shallow, but if any of your code is deeply recursive, you should consider switching to a more competent implementation.

Serial execution

When you want to call a 3rd-party service, things are less simple because you probably need to adhere to rate-limits. These will dictate your behaviour when sending and also when backing-off in the face of 429 responses.

Staying with the previous example, it can be modified to send data serially and take a break if the rate-limit is violated, like so:

const request = require('request-promise');

const { METRICS_API_KEY } = process.env;
const METRICS_ENDPOINT = 'https://example.com/metrics';
const RETRY_LIMIT = 3;
const BACKOFF_INTERVAL = 30000;

function send (data) {
  return data.reduce(async (promise, batch) => {
    let responses = await promise;

    responses.push(await sendBatch(batch));

    return responses;
  }, Promise.resolve([]));
}

async function sendBatch (batch, iteration = 0) {
  try {
    return await request(METRICS_ENDPOINT, {
      simple: true,
      method: 'POST',
      formData: {
        api_key: METRICS_API_KEY,
        data: JSON.stringify(batch),
      },
    }).promise();
  } catch (error) {
    if (iteration === RETRY_LIMIT) {
      return error;
    }

    if (error.statusCode === 429) {
      return new Promise(resolve => {
        setTimeout(() => {
          sendBatch(batch, iteration + 1)
            .then(resolve);
        }, BACKOFF_INTERVAL);
      });
    }

    return sendBatch(batch, iteration + 1);
  }
}

Here the Array.map is changed to an Array.reduce, where the accumulator argument is a promise and it waits for that promise to resolve at the start of each iteration. This forces the loop to execute serially, waiting for each batch to finish before beginning the next one.

Then further down, in the error-handling logic, a condition is added to check whether the error response has a 429 status code. If it does, the recursive call is delayed for 30 seconds and the whole loop is paused waiting for that back-off period to expire.

If the documented rate limit was, say, 10 batches per second, you could take this approach a step further and pre-emptively seek to honour the rate-limit without triggering the prohibitive 30-second back-off:

const request = require('request-promise');

const { METRICS_API_KEY } = process.env;
const METRICS_ENDPOINT = 'https://example.com/metrics';
const BATCH_INTERVAL = 100;
const RETRY_LIMIT = 3;
const BACKOFF_INTERVAL = 30000;

function send (data) {
  return data.reduce(async (promise, batch) => {
    let responses = await promise;

    responses.push(await sendBatch(batch));

    await new Promise(resolve => {
      setTimeout(resolve, BATCH_INTERVAL);
    });

    return responses;
  }, Promise.resolve([]));
}

async function sendBatch (batch, iteration = 0) {
  try {
    return await request(METRICS_ENDPOINT, {
      simple: true,
      method: 'POST',
      formData: {
        api_key: METRICS_API_KEY,
        data: JSON.stringify(batch),
      },
    }).promise();
  } catch (error) {
    if (iteration === RETRY_LIMIT) {
      return error;
    }

    if (error.statusCode === 429) {
      return new Promise(resolve => {
        setTimeout(() => {
          sendBatch(batch, iteration + 1)
            .then(resolve);
        }, BACKOFF_INTERVAL);
      });
    }

    return sendBatch(batch, iteration + 1);
  }
}

Here the sendBatch function remains unchanged but the reducer was tweaked to add a short delay in between batches.

Conclusion

So, in summary:

If you want to see examples of this approach in production code, in mozilla/fxa-amplitude-send#81 I applied it to the metrics pipeline for Firefox Accounts and in mozilla/fxa-shared#56 I used it in our feature-flagging abstraction.