Vincent A Saulys' Blog
Promises in Javascript are Wild
Tags: parallelism programming
April 06, 2021

This concept bugged me and continues to bug me so I wanted to lay it out here.

Javascript -- specifically looking at NodeJS -- works async by default. This means that most operations are executed but they are not waited on for completion before executing the next command block. This much is probably known to readers.

It gets a bit weird though.

For instance, let's look at a real block of code I wrote this morning:

// ... don't worry about deps

const getImageAsBase64 = async (imageUrl) => {
  if (!imageUrl) return;
  const type = imageUrl.match(/\.([0-9a-z]+)(?:[\?#]|$)/i)[1];

  const r = await fetch(imageUrl);
  let buf = await r.buffer();
  buf = `data:image/${type};base64,` + buf.toString("base64");
  return buf;
};

extract("http://some-url.com")
  .then((article) => {
    // grab and inline all images
    const $ = cheerio.load(article.content);
    let images = [];

    /* * * * * this doesn't actually wait!!! * * * * */
    $("*")
      .find("img")
      .each(async (i, el) => {
        // gurrr
        const res = await getImageAsBase64(el.attribs?.src || el.attribs?.href);
        if (!res) return;

        images.push(res);
      });
    return images;

    console.log(images.length, images[0]);
  })
  .catch((err) => {
    console.log(err);
  });

Javascript has a unqiue dual problem of unusual async syntax -- which is not intuitive to reason about -- but also has scoping problems!.

The problem at heart here is that for each element (i.e. .each(...)), an async function is passed. This function executes X times and does not wait for any completion. NodeJS moves forward and lets these threads run wild while it executes the next instruction. The await on my async function does not behave as I thought it would and does not stop anything.

In fact, this function returns a Promise -- the jobs of dual notations -- and sticks this into the images list. Nothing is blocked on waiting for them to actually get the image data. Which is what I desire.

My next thought was to force this to wait by turning the execution into an async function though the bodge of wrapping it and then calling it immediately. This also did not work.

extract("http://some-url.com")
  .then((article) => {
    // grab and inline all images
    const $ = cheerio.load(article.content);

    /* * * * * still no waiting * * * * */
    const images = (async () => {
      let images = [];

      const things = await $("*")
        .find("img")
        .each(async (i, el) => {
          const res = await getImageAsBase64(
            el.attribs?.src || el.attribs?.href
          );
          if (!res) return;

          images.push(res);
        });
      return images;
    })();

    console.log(images.length, images[0]);
  })
  .catch((err) => {
    console.log(err);
  });

What is really needed is to grab each image, wait until I have them all, and then move fowrard.

That leads to the criminally ugly syntax of...

extract(listOfUrls[0])
  .then((article) => cheerio.load(article.content))
  .then(($) => {
    let images = [];
    $("*")
      .find("img")
      .each((i, el) => images.push(el.attribs?.src || el.attribs?.href));
    return images;
  })
  .then((images) => {
    console.log(images);
    return images;
  })
  .then((images) => images.map((cV) => getImageAsBase64(cV)))
  .then(async (promises) => await Promise.all(promises)) // this is the ticket!
  .then((encodedStrings) => console.log(encodedStrings.length, encodedStrings))
  .catch((err) => {
    console.log(err);
  });

Ah yes, finally, that's the ticket!.

Sidenote: Cheerio sucks at DOM manipulation. I highly recommend jsdom instead, which works more like a vanilla browser javascript would.

While it did work to get all my images, I needed to then pass those back in the manipulated DOM. The idea was to replace all <img src=.../> tags with inlined base64 versions.

You could download all the images, find the matching img tag, and inline it there. That runs into two problems:

  1. you can't rely on order -- you need to know how to match them up
  2. you have to pass in the DOM element as you chain down

Matching made most sense on the original src url to me. This gave a very wonky passing of values:

const getData = async (listOfUrls) => {
  // extract the
  return Promise.all(
    listOfUrls.map((articleUrl) =>
      extract(articleUrl)
        .then((article) => {
          const dom = new JSDOM(article.content);
          return dom.window.document;
        })
        .then((document) => {
          let images = [];
          document.querySelectorAll("img").forEach((cV) => images.push(cV.src));
          return { document, images };
        })
        .then(({ document, images }) => ({
          document,
          promises: images.map((cV) => getImageAsBase64(cV))
        }))
        .then(async ({ document, promises }) => {
          let values = await Promise.all(promises);
          return { document, encodedStrings: values };
        })
        .then(({ document, encodedStrings }) => {
          document.querySelectorAll("img").forEach((cV) => {
            let newImgSrc = encodedStrings
              .filter((ccV) => ccV)
              .filter(
                (ccV) => cV.src === ccV.imageUrl || cV.href === ccV.imageUrl
              )[0];
            if (!newImgSrc) return;
            cV.src = newImgSrc.buf;
          });
          return document;
        })
        .then((document) => ({
          title: "test",
          data: document.documentElement.outerHTML
        }))
    )
  );
};

Now, I ended up not doing this -- the subtle reasons how I was able to change this require too much context to give here.

But I wanted to illustrate how crazy NodeJS gets even when you're starting a logical beginning (e.g. how do I get all my images?). It feels like you need to rethink your app as it goes up in some basic complexity. It doesn't scale with thinking as naturally as, say, Python does.

Not sure why this is.

I will say that the premade libraries in NodeJS (e.g. ReactJS) work fantastic. Perhaps they benefit from good patterns being expressed. Who knows.

Share On:
EmailTwitterHackerNewsRedditLinkedIn