Node.js control flow - an overview

A beginners guide to asynchronicity

I’ve found that most explanations about how Node.js handles asynchronous control flow are difficult for me to understand because those explanations assume either 1) a deep knowledge of computer science or 2) the reader is only interested in Node.js, not the foundational pieces it is built on.  

This post is my attempt to fill this gap.

Concurrency in general

Concurrency is everywhere in the modern programming landscape:

  • A web site might have to handle many simultaneous users.
  • A set of APIs that need to coordinate transactions might be distributed across many computers in a cloud computing environment.
  • An IDE might compile code in the background as a developer writes their code.

As one of the creators of Golang - Rob Pike - put it:

 "Concurrency is the composition of independently executing things (typically functions)."

Unfortunately, good code design is hard in a concurrent environment - period. In concurrent programs, a developer has to manage multiple activities with overlapping timelines. Even with a set of relatively simple components, complex communication patterns or poor composition can lead to race conditions. Race conditions are notoriously hard to test and debug since they are often affected by environment conditions like network traffic, the state of the operating system, and even hardware usage.

Concurrency is the composition of independently executing things (typically functions).-Rob Pike, co-creator of Golang

Node.js and threading

As stated in MDN documentation, JavaScript tries to simplify the problems of concurrent code by providing a single-threaded event loop. This event loop is responsible for:

  • Executing the code.
  • Collecting and processing events.
  • Executing queued sub-tasks.

In particular, once a function starts, it will run to completion before any other part of your program starts running - you know that no other code will corrupt the data. But if you want to build a web server, for example, concurrency still needs to be addressed.

Asynchronous by default

To provide concurrency, time-consuming operations (in particular I/O events and I/O callbacks) in Node.js are asynchronous by default. In order to achieve this asynchronous behavior, modern Node.js utilizes the event loop as a central dispatcher that routes requests to C++ and returns those results to JavaScript.

The decision to take this approach with I/O was baked into Node.js from the start. Ryan Dahl (the creator of Node.js), inspired by NGINX, believed at the time that, "Maybe we were doing I/O wrong and that maybe if we did everything in a non-blocking way, that we would solve a lot of the difficulties with programming. Like, perhaps we could forget about threads entirely and only use process abstractions and serialized communications."

In the end, however, Node.js could not dispense with threading.

To understand where Node.js utilizes threads for concurrency and where it relies on other means, it is important to understand that one level down in the Node.js architecture sits libuv, a C library built specifically for Node.js. Libuv handles the following operations:

  • Full-featured event loop backed by epoll, kqueue, IOCP, event ports
  • Asynchronous TCP and UDP sockets
  • Asynchronous DNS resolution
  • Asynchronous file and file system operations
  • File system events
  • Child processes
  • Thread pool
  • Signal handling
  • Threading and synchronization primitives

Some of the items in this list allows libuv (and Node.js) to not concern itself with threading. For example, the use of O/S event notification systems like epoll and IOCP for handling web server operations.

On the other hand, certain operations in libuv (and Node.js) are synchronous and threaded. As described in Don't Block The Event Loop:

"Node.js has two types of threads: one Event Loop and k Workers. The Event Loop is responsible for JavaScript callbacks and non-blocking I/O, and a Worker executes tasks corresponding to C++ code that completes an asynchronous request, including blocking I/O and CPU-intensive work. Both types of threads work on no more than one activity at a time."

Node.js uses the Worker Pool to handle 'expensive' tasks. This includes I/O for which an operating system does not provide a non-blocking version, as well as particularly CPU-intensive tasks.

These are the Node.js module APIs that make use of this Worker Pool:

  1. I/O-intensive
    1. DNS: dns.lookup(), dns.lookupService().
    2. File System: All file system APIs except fs.FSWatcher() and those that are explicitly synchronous use libuv's thread pool.
  2. CPU-intensive
    1. Crypto: crypto.pbkdf2(), crypto.scrypt(), crypto.randomBytes(), crypto.randomFill(), crypto.generateKeyPair().
    2. Zlib: All zlib APIs except those that are explicitly synchronous use libuv's thread pool."

You want threads, you got them!

Node.js also provides a separate Worker Thread module for situations in which developers wish to perform CPU-intensive JavaScript operations (for example, file compression - this is not intended to be used by I/O operations). The Worker Thread module allows developers to create their own custom thread pool, and allows the threads to communicate through shared memory.

The bottom line is that:

  • For many of the use cases in which Node.js excels (like creating web servers), developers really don't have to think about threading and can focus on the "process abstractions."
  • But threading is there for limited use cases; and, if you need it.

The journey from Callback to Async/Await

The journey from callback to async/await in Node.js reflects its tight relationship with V8 - Google's open source JavaScript and WebAssembly engine, written in C++. As V8 adds functionality, both the Chrome Browser and Node.js will integrate the changes into their codebase.

Callbacks - A first try at taming asynchronous complexity

Initially, V8 (and Node.js) handled asynchronous code using a continuation-passing style (CPS) pattern. This pattern originally emerged in the mid-1970s in the Scheme language:

"A function written in continuation-passing style takes an extra argument: an explicit "continuation", i.e. a function of one argument. When the CPS function has computed its result value, it "returns" it by calling the continuation function with this value as the argument. That means that when invoking a CPS function, the calling function is required to supply a procedure to be invoked with the subroutine's "return" value."  - "Continuation-passing style" on Wikipedia (CC BY-SA 3.0)

In JavaScript/Node.js, we know this as the callback pattern. While callback functions work well enough for simple programs, most real life situations make using callbacks error prone. For example, suppose you have a web page that needs to retrieve remote data, and then depending on the retrieved data, needs to load more data and a set of images. In this case, you can end up in "callback hell" - deeply nested callbacks with hard-to-understand success and failure paths. This is the classic "pyramid of doom" (as illustrated in the Mozilla guide on using promises):

    doSomething(function(result) {
  doSomethingElse(result, function(newResult) {
    doThirdThing(newResult, function(finalResult) {
      console.log('Got the final result: ' + finalResult);
    }, failureCallback);
  }, failureCallback);
}, failureCallback);
  

Promises - Simplifying further

So, how to avoid the pyramid of doom? Enter Promises.

The idea of promises had been around for awhile. About the same time the CPS emerged in Scheme, alternative approaches to handling asynchronicity emerged under different names in a variety of programming languages: futures, defers, etc.. In 1988, Barbara Liskov and Liuba Shrira (while doing research for DARPA) coined the term promises to describe a construct that, "Support(s) an efficient asynchronous remote procedure call mechanism for use by components of a distributed program. A promise is a place holder for a value that will exist in the future. It is created at the time a call is made. The call computes the value of the promise, running in parallel with the program that made the call. When it completes, its results are stored in the promise and can then be 'claimed' by the caller."

Released in January 2014, Chrome Version 32 introduced support for promises. Node.js followed suit in February 2015 with promise support in 0.12.

Much like the vision described in 1988, JavaScript promises are objects that allow you to compose asynchronous tasks that run calls in parallel; and to pipeline the calls together:

    doSomething()
.then(result => doSomethingElse(result))
.then(newResult => doThirdThing(newResult))
.then(finalResult => {
  console.log(`Got the final result: ${finalResult}`);
})
.catch(failureCallback);
});
  

At a high level, promises have three states:

    //A new Promise starts in a "Pending" state
new Promise(function (resolve, reject) {
  reject(new Error('Transition to a "Rejected State"'))

  resolve({ message: 'Transition to a "Fulfilled State"'})
})
  

As a developer, you can do more than chain promises in Node.js - you can use a number of methods to compose promises in order to more easily manage groups of asynchronous tasks:

  • Promise.all - This method is typically used when there are multiple asynchronous tasks that are dependent on one another to complete successfully. Promise.all takes an iterable of promises as an input, and returns a single promise as an output. This returned promise will resolve when all of the input's promises have resolved and non-promises have returned, or if the input iterable contains no promises. It rejects immediately upon any of the input promises rejecting or non-promises throwing an error, and will reject with this first rejection message / error.
  • Promise.race - This method returns a promise that fulfills or rejects as soon as one of the promises in an iterable fulfills or rejects, with the value or reason from that promise.
  • Promise.allSettled - This method is typically used when you have multiple asynchronous tasks that are not dependent on one another to complete successfully, or you'd always like to know the result of each promise. Promise.allSettled() returns a promise that resolves after all of the given promises have either fulfilled or rejected, with an array of objects that each describes the outcome of each promise - for example:
    // asynchronous processing
// we will talk about async/await syntax in a minute
async processor(message) {
  return await someOperation(message);
  });
}

// initial call
// inside of Promise.allSettled,  map all messages to the asynchronous processor call
const results = await Promise.allSettled(messages.map(message => processor(message)));

// values returned to results
//[
//  {"status":"rejected","reason":"my first promise call failed"},
//  {"status":"fulfilled","value":"success"}
//]
  

Note: Promise.allSettled was only introduced into Node.js with version 12.9.0 (released in August 2019). Likewise, while the ability to compose promises using native Node.js functionality minimizes your dependency tree, it also sometimes comes with performance sacrifices - for example, the Bluebird library can be up to four times faster according to at least one set of tests.

Between the ability to chain Promises and the ability to compose multiple Promise chains, V8 (and Node.js) eliminated much of the complexity that was inherent in the CPS/callback style of programming without sacrificing any of the functionality.

Async/Await - Making promises look like synchronous code and improving performance

In October 2017, V8 introduced async/await as an alternative to the more complex promises syntax - and, at the same time, Node 8 also began supporting the same syntax.

Async/await structures asynchronous code in such a way that is identical to synchronous code. That being said, code using async/await functions operate much the same as promises behind the scene:

"Await expressions suspend progress through an async function, yielding control and subsequently resuming progress only when an awaited promise-based asynchronous operation is either fulfilled or rejected. The resolved value of the promise is treated as the return value of the await expression. Use of async / await enables the use of ordinary try / catch blocks around asynchronous code." - “async function - JavaScript” by Mozilla Contributors (CC BY-SA 4.0)

For example:

    async function foo() {
  try {
    const result = await doSomething();
    const newResult = await doSomethingElse(result);
    const finalResult = await doThirdThing(newResult);
    console.log(`Got the final result: ${finalResult}`);
  } catch(error) {
    failureCallback(error);
  }
}
  

There are a number of advantages to using async/await over the original promise setup, where possible:

  • async/await code is certainly more readable than CPS/callback style, or even the promise syntax.
  • async/await code outperforms hand-written Promise code - but only for sequential promise code - at least since November 2018. That being said, the difference is relatively minor - for example, when I ran the V8 Promise Performance Tests against Node.js v12.16.2, I got the following results:
    Time(doxbee-async-es2017-native): 19.9 ms. Time(doxbee-promises-es2015-native): 26.3 ms.
  

More details on the current state of promise performances can be found in this nice write-up by the Kuzzle team.

Async/Await and promises - A word of caution

Even with the arrival of async/await, simplifying asynchronous code continues to be a challenge. For example, core Node.js contributor James Snell provides seven recommendations to avoid broken promises:

  • Know when your code is executed
  • Do not use unexpected promises
  • Avoid mixing promises and callback
  • Don't create promises in loops
  • Understand that synchronous promises are useless
  • Avoid long .then() chains
  • Avoid overthinking it

Generators and iterators - Another way to yield control

As mentioned above, "Await expressions suspend progress through an async function, yielding control and subsequently resuming progress only when an awaited promise-based asynchronous operation is either fulfilled or rejected."

Another way to yield control in V8 and Node.js is through generators. While the direct use of generators is relatively rare in the recent Node.js most of us have seen, they are still worth understanding in case you ever stumble across them.

Generators as coroutines

In order to understand generators and how they relate to control flow, it is important to understand the concepts of subroutines and coroutines.

A subroutine is a block of centralized code (a function) that performs a specific task, packaged as a unit. This unit can then be used in programs whenever that particular task should be performed. When subroutines are invoked, execution begins at the start and once a subroutine exits it is finished; an instance of a subroutine only returns once, and does not hold state between invocations. For example, in JavaScript a synchronous subroutine might look like the following:

    function add(x, y) {
    return x + y;
}
  

Subroutines are a staple of structured programming, a paradigm that many of us learned early in our education as programmers.

  • A coroutine is also a block of centralized code. Unlike a subroutine, however, generalize subroutines for cooperative multitasking, by allowing execution to be suspended and resumed. Processes voluntarily yield control periodically in order to enable multiple tasks to be run concurrently. Coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original code routine.
  • A generator is a “semi-coroutine" - a subset of coroutine.

Both generators and coroutines can yield multiple times, suspending their execution and allowing re-entry at multiple points. A generator simply transfers control back to the generator's caller and passes a value. In the JavaScript world, all generators are also iterators:

    // code example borrowed from Arfat Salman at
// https://codeburst.io/understanding-generators-in-es6-javascript-with-examples-6728834016d5

function * generatorFunction() { // Line 1
  console.log('This will be executed first.');
  yield 'Hello, ';   // Line 2
  console.log('I will be printed after the pause');  
  yield 'World!';
}
const generatorObject = generatorFunction(); // Line 3
console.log(generatorObject.next().value); // Line 4
console.log(generatorObject.next().value); // Line 5
console.log(generatorObject.next().value); // Line 6

/*********************************************/
/* Output begins                             */
/*********************************************/

// This will be executed first.
// Hello,
// I will be printed after the pause
// World!
// undefined

/*********************************************/
/* Output ends                               */
/*********************************************/
  

A thread about Python on StackOverflow puts it well:

“A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.” - “Difference between function and generator?” on StackOverflow by no  coder (CC BY-SA 4.0)

What is an iterator?

As mentioned, generators are always iterators. So, an iterator usually takes the form of a code reference that, when executed, calculates the next item in a container and returns it. When the iterator reaches the end of the container, it returns an agreed-upon value.

Iterators and iterables are actually scattered throughout JavaScript. For example, behind the scenes spread syntax utilizes iteration:

    const someString = 'hi'
console.log([...someString]); // ["h","i"]
  

That being said, perhaps the most common use of iterable values in JavaScript is the for of loop. For example, since arrays are iterable, the following works as expected:

    for (const element of [10,12,17,19]) {
    console.log(element);
}
  

The following containers are all built-in iterables, because each of their prototype objects implements an @@iterator method:

  • String
  • Array
  • Map
  • Set

Likewise, a number of APIs in JavaScript accept iterables, including:

  • Map
  • Set
  • Promise.all
  • Promise.allSettled
  • Promise.race
  • Array.from

Generators - Why?

In general, custom iterators and generators are not something you need to implement - you can mostly utilize a combination of JavaScript’s built-in iterators and async/await syntax. In particular, it begs the question - “Why were generators ever implemented in the language?”

In the days before async/await operations, generators could be used to give a developer a cleaner way to deal with asynchronous code with the help of libraries like co:

    // code example borrowed from Arfat Salman at
// https://codeburst.io/understanding-generators-in-es6-javascript-with-examples-6728834016d5

/* original code written with Promises */
function fetchJson(url) {
    return fetch(url)
    .then(request => request.text())
    .then(text => {
        return JSON.parse(text);
    })
    .catch(error => {
        console.log(`ERROR: ${error.stack}`);
    });
}

/* using the co library to polyfill async/await */
const fetchJson = co.wrap(function* (url) {
    try {
        let request = yield fetch(url);
        let text = yield request.text();
        return JSON.parse(text);
    }
    catch (error) {
        console.log(`ERROR: ${error.stack}`);
    }
});

/* the same call using plain old async/await */
async function fetchJson(url) {
    try {
        let request = await fetch(url);
        let text = await request.text();
        return JSON.parse(text);
    }
    catch (error) {
        console.log(`ERROR: ${error.stack}`);
    }
}
  

Likewise, before V8 optimized the underlying implementation in November 2018, well-written generator code could actually outperform similar async/await code. For example, see this factorial test from 2015. All of this being said, async/await is the dominant asynchronous paradigm in Node.js in 2020.

A final note - Simple stuff

Like most languages, Node.js/JavaScript has a set of standard structured control flow constructs that operate much like other languages. For more information on these, please take a look at MDN:

  • if/else
  • for
  • while
  • try/catch
  • throw

Wrapping it all up

Understanding Node.js control flow is critical to building large-scale applications that can be relied on to predictably deliver business value. My hope is that this article will help software engineers gain a better understanding of how Node.js handles this problem.  


Noah Mandelbaum, Software Engineer

Yes, We’re Open Source!

Learn more about how we make open source work in our highly regulated industry.