javascript

Node.js Architecture Pitfalls to Avoid

Nate Anderson

Nate Anderson on

Node.js Architecture Pitfalls to Avoid

Architecting a maintainable, readable, and reliable codebase is important not only for the people who will work on the code after you, but also for your own sanity.

I’ve worked on dozens of production Node.js backend services and will share my experience of architecture pitfalls to avoid when building yours.

The examples in this article are tailored to JavaScript and TypeScript, and some of the problems (like dependency management) require more attention in Node.js projects than in other languages. However, many of these principles apply to other languages too.

Let's get going!

Why Worry About Node.js Architecture?

When I started my first mid-level developer job, I had left a company that used frameworks for every project and joined a department that wrote every backend service from scratch on Node.js.

I think many devs have experienced this moment. Juniors often work in established codebases or on smaller projects where the size and simplicity keep things under control.

But a day may come when you'll need to start a new project; the decisions you make at the beginning might affect your brand's reliability, revenue, and customer experiences for years to come. Thoughtful coding and architecture in a project's early days often impact the development experience more than specific library or framework choices.

Globals in JavaScript: You Were Warned for a Reason!

If you've ever taken a formal computer science course, your professor likely warned you to avoid using global variables but may not have explained why.

Globals are dangerous for several reasons. Let's say you import a function that depends on a global variable from one module to another, for example. That global variable is like an invisible argument to your function that the caller has no knowledge of or control over.

Worse yet, if a program mutates that global variable, you must consider its state and order of operations whenever you access the variable. I find it is almost always safer to pass a variable to a function or class constructor directly than to depend on a global.

The Problem with Globals: An Example

Arguments are more explicit and self-documenting than globals. Callers of a function will better understand the data's importance in a variable because it's right in the method signature with a relevant name. Consider a function that alerts a system administrator to some event:

javascript
const adminEmail = "admin@example.com"; // notify an admin about a low-priority issue export function notifyAdmin(subject, message) { emailClient.send( { subject, message, recipient: adminEmail }, (error, result) => { if (error) { return console.error(error); } console.log("Email sent successfully!"); } ); } // ... // critical error, needs attention now. wake up the owner! export function wakeOwnerUp(subject, message) { pagerClient.wakeUp( { user: adminEmail, priority: "high", respectQuietHours: false, }, (error, result) => console.log({ error, result }) ); }

This is, admittedly, a dramatic and cheesy example. But notice how any code that calls notifyAdmin has no control over who is contacted.

Sure, it's easy enough to change the value of adminEmail and redeploy the code, but because that global is used in at least one other function, this change can have cascading effects. What if the owner and the admin are no longer the same person? We don't want to be the ones responsible for waking our boss up at 3 AM when it's some poor developer-on-duty's job to have a look.

Furthermore, what if we need to send an email to more than one address? We can't simply change adminEmail to an array and expect everything to work.

The Fix: Using Arguments

If we make notifyAdmin into a more generic notifyEmail function, we can call it many times with as many different addresses as we like. And wakeOwnerUp and notifyEmail are no longer tightly coupled — they can interact with different people. This change also makes it easier to adjust our notification settings without redeploying the entire project; the admin and owner emails can come from a database instead of a hard-coded variable.

There are some more nuanced benefits to eschewing globals in favor of arguments. Global values are often runtime configuration parameters that change depending on the environment. So you may need to test behavior when these values change to test your code thoroughly.

Let's say we're writing tests for the example above. We've mocked the emailClient and pagerClient and want to assert that the email address these clients will use matches the expected value in our test setup.

That's currently impossible without opening the source code file and copying the email address to a new variable in the test.

Set Up A Test That Goes to One Email

Wouldn't it be easier to test that the code just emails the address we ask it to? We have two options here:

  1. Export the adminEmail variable from its module and allow tests (and other code) to change its value at runtime.
  2. Make the email address an argument to the function calls.

As previously mentioned, mutable globals require extreme care to avoid causing unpredictable and unwanted results. This is true in tests too — especially if your tests run in parallel, as many modern test runners will encourage.

Imagine test A, a unit test, changes the value of adminEmail to assert that an email is sent to the correct address. But test B, an integration test of a function that calls wakeOwnerUp, asserts that a different user receives a pager call. If your tests run concurrently, you're likely to experience a flap, a test that passes inconsistently.

Instead, I would encourage you to go with option 2:

  • Define your expected outcome in the test setup
  • Pass a value into the function being tested
  • Assert that the result matches the expected value by inspecting the return value or mocking one of its dependencies

Another benefit of passing data as an argument is to reduce possible data races, like those that may occur when your program starts up.

For example, suppose your app connects to a database using a global connection object variable, and your program sets up this connection using callbacks or asynchronous code. Your app may reach a "ready" state and start accepting requests before it even attempts to connect to its database. For example:

javascript
import { connect } from "myDatabase"; import { router } from "pronto"; var DB; function main() { const app = router(); app.get("/", (request, response) => { DB.query("SELECT * FROM users", (error, rows) => { if (error) { response.error(error); } return response.json(rows); }); }); connect("localhost:3305", (error, connection) => { if (error) { console.log(`Failed to connect to DB: error ${error}`); process.exit(1); } DB = connection; }); app.listen(); }

If the process of connecting to your database is slow (which isn't all that unlikely), the asynchronous nature of JavaScript means your app.get and app.listen calls may be reached well before your DB variable is assigned. Your service is online but not usable.

Environment Variables in JavaScript: Socially Acceptable Globals

Environment variables are often an excellent way to configure your application. They're nearly universal, supported by every major OS and programming language.

They're flexible, too; you can:

  • configure them system-wide or per-user in the shell
  • specify them at program launch by prefixing the command
  • configure them easily in dockerfiles, bash scripts, and .env files

But environment variables can be dangerous if you aren't careful about how you access them. If you think about it from your program's perspective, an environment variable is just a global variable owned by the operating system instead of the program.

Just like the first global variable example above, relying on an environment variable at the wrong level of abstraction makes your code difficult to test and reason about.

javascript
export function notifyAdmin(subject, message) { const address = process.env.ADMIN_EMAIL; emailClient.send( { subject, message, recipient: address }, (error, result) => { if (error) { return console.error(error); } console.log("Email sent successfully!"); } ); }

While the ways we can configure ADMIN_EMAIL as an environment variable are infinitely more flexible than a hard-coded constant in the code, we are still susceptible to some familiar problems:

  1. Reliance on global state makes the function's behavior unclear.
  2. Changing environment variables affects all the code points that read that variable.
  3. It's difficult to concurrently test that the code responds appropriately to different values of the environment variable.

But, unlike hard-coded globals, the moral of this section is not "don't use environment variables!". Instead, it's "be careful where you read environment variables."

Instead of reading from process.env throughout your code, define a configuration object that contains your environment variables under named keys. If you're using TypeScript, this even gives you some compile-time guarantees for code accessing your config variables.

Plus, if you ever need to read configuration values from a source other than your environment, none of the downstream code changes. Only the code that actually reads the values into memory changes.

A neat way to gather up all your configuration variables at program startup is to use an object and null coalescing operators to handle missing configuration values.

javascript
const config = { dbHost: process.env.DB_HOST ?? "localhost", dbPort: process.env.DB_PORT ?? 3306, dbUser: process.env.DB_USER ?? "web", dbPass: process.env.DB_PASS ?? "", };

The details of the code that reads the configuration aren't super important. Obviously, there are cases where environment variables will change at runtime, requiring a more involved solution than the one presented here. The critical thing is to use this method to read configuration everywhere you can and avoid reading directly from the environment at runtime.

Make Your JavaScript Code Easy to Test with Environment Variables: An Example

Here's an example of how this approach makes your code more testable. Say we are testing a database migration tool and want to run tests in parallel against different databases. A function that relies directly on environment variables cannot do this.

javascript
import { connect } from "myDatabase"; async function migrate(migrations) { const { DB_HOST, DB_USER, DB_PORT, DB_PASS } = process.env; const db = connect(`${DB_USER}:${DB_PASS}@${DB_HOST}:${DB_PORT}`); for (const migration in migrations) { await db.exec(migration); } }

Simply rewriting this function to take a configuration object as an argument instead of reading from the process.env global makes concurrent testing possible.

javascript
import { connect } from "myDatabase"; async function migrate(migrations, config) { const db = connect( `${config.DB_USER}:${config.DB_PASS}@${config.DB_HOST}:${config.DB_PORT}` ); for (const migration in migrations) { await db.exec(migration); } }

We can call this function as many times, in as many concurrent invocations, as we want, and each invocation will use the exact database connection details we provide. The tests can get these values anywhere they please — coupling to a specific environment is removed.

Request-scoped Dependencies: Not Global Enough?

In almost the opposite vein of the previous two pitfalls, a variable's scope can be too small. Certain resources are meant to be long-lived and shared between requests:

  • Database connections are often pooled by default and should be limited to one pool per process.
  • The number of files a process may have open at once can be limited by the OS. By leaving files open, you might run out of available file descriptors.

Let's look at a common example of this problem. Many backend web development tutorials either demonstrate assigning your database connection to a global variable or instantiating a connection on each request. I have discouraged both practices in this piece, so let's look at an alternative.

A request-scoped database connection might look something like this:

javascript
import { connect } from "myDatabase"; import { router } from "pronto"; const app = router(); const dbCreds = "user:pass@localhost:3305"; app.get("/users", async (request, response) => { const db = await connect(dbCreds); const users = db.query("SELECT id, first_name, last_name, email FROM user"); res.json({ users }); });

This will likely work fine when you test it on your own machine, with only one active user. But what happens when the site goes live? If the connect function here returns a connection pool instead of a single connection, each request will result in multiple connections to the database, even though only one is being used.

It's worth noting that not all database connection libraries pool by default, so pay attention to the docs for the one you're using.

Suppose we define our routes inside a class or function instead. We can pass the database connection in once we're sure it's ready, and use a single database connection throughout our app, referencing it as needed.

This technique is called dependency injection, and I've written a rundown of using dependency injection in JavaScript and TypeScript. Here's an example:

javascript
function routes(app, db) { app.get("/users", async (request, response) => { const users = db.query("SELECT id, first_name, last_name, email FROM user"); res.json({ users }); }); } async function main() { const db = await connect(dbCreds); const app = router(); routes(app, db); }

Encapsulating our routing logic into functions makes it easy to write routes without thinking about dependencies until it's time for the code to actually run. It also makes test setup easier, allowing you to test your routes with a mock router or database if you choose.

It's best to limit this pattern to resources designed to be long-lived. There are many times when request-scoped variables are completely appropriate. When deciding how to scope your dependency variables, you should consult library documentation for things like:

  • database connections
  • caching server connections
  • any other external stateful dependency

Node.js Modules: A Slippery Slope

The Node.js ecosystem has a reputation for overusing third-party modules, and it’s not entirely undeserved.

Simple, popular example projects that rank highly on Google often rely on Node modules with hundreds of megabytes spread across dozens of packages. Newcomers to the ecosystem are encouraged to npm install early and often.

RepoForksStarsnode_modules SizeVulnerabilitiesLast Commit
Example Express app1.5k3.3k123mb45 high, 14 critical5 years ago
Node.js Integration Tests Best Practices1202.3k288mb16 high, 2 critical6 months ago
Bulletproof Node.js API1.1k4.8k201mb8 high, 5 critical1 year ago

I don't mean to call out these repos; the authors generously provided examples and information that have helped thousands of people get started coding in Node.js.

And these are not unusual examples. I've seen Node.js projects with gigabytes of third-party dependencies. I do think these are good examples to point out, though — specifically "Bulletproof Node.js". Notice here, NPM identified over a dozen high and critical priority vulnerabilities in what is meant to be an example of best security practices. This repo has only been untouched for one year!

The Issue with Third-party Libraries for Node.js

So what's wrong with depending on lots of third-party libraries? I pointed out the disk space used by these example projects, but storage these days is one of the cheapest available resources. It's certainly cheaper than developer time. So am I encouraging you to use only the Node.js standard library and write every other piece of code you need by yourself?

Definitely not. There will always be areas of expertise where we need to borrow from other developers, or areas of our business mission that are critical enough to merit relying on highly-tested and well-known libraries.

What we should be aware of, however, are the types of dependencies we bring in. I'd like to propose a few metrics to examine before including a dependency in your project, and then will dispel a few metrics that I think get too much attention.

Metrics to Assess Third-party Libraries

Some good metrics to evaluate third-party packages include:

  • Test coverage: does the package publish a test coverage figure? Is that number similar to, or optimally greater than, that of your own project? Note that branch coverage and statement coverage are very different numbers. Be sure to compare like to like.
  • Subdependencies: does this package import the world to get the job done? Will you have to monitor for vulnerabilities in code you don't even consume?
  • Usage: does the package have many downloads on NPM or stars on Github? If not, it may not have been used thoroughly enough to identify edge cases and bugs that could affect your code. Note that this risk is mitigated somewhat if the package is very well-tested.
  • Usability: does this library solve multiple problems you will need to address?

For example, say you need to remove items from an array.

While you could write this code yourself, you could also import a third-party library to do it. There's an NPM package called remove-array-items that will do the trick.

If we install a single package to solve this problem, and another for the next problem we encounter, our dependencies will quickly spiral out of control. Instead, we could turn to a utility library like lodash, which will solve hundreds of little problems like this, and add only one dependency to our package.json.

Metrics to Avoid

Some problematic metrics for evaluating third-party packages:

  • Date of last release: while it's tempting to assume that a package that hasn't been updated in a year is outdated or abandoned, this may not always be the case.

Express, one of the most downloaded backend libraries on NPM, frequently goes six months to a year without a release. In fact, Express 5.0 has been in alpha for eight years. But a library that has been in use as long (and by so many people) as Express is unlikely to surprise you.

  • Shininess: the JavaScript ecosystem has a reputation for producing an exciting new framework or backbone library every month. I highly recommend allowing these freshly-baked treats to cool off before you take a bite. While new tech is always exciting, you don't want your production app to be the reason a major bug is discovered.

I present these do and don't metrics not as a definitive rule set, but as a gentle guide towards producing more reliable software. Every decision has trade-offs: a larger library may have features you don't use and take up more disk space than a selection of single-purpose tools. There are many facets to consider.

I have seen many Node.js projects, even those under active development, fall prey to dozens of vulnerabilities in the worst case, and have their dependencies go badly out of date in the best case.

Letting your out-of-date dependencies pile up can keep you from responding quickly to changing business needs, or prevent you from upgrading your Node.js version quickly if your release is approaching end-of-life.

If you think of the code you write as your child, the code you install is your pet. It doesn't require the same degree of care, and it can make your life easy and fun at times, but you still have to clean up its messes.

Wrapping Up: Work Smarter, Not Harder

In this post, we touched on:

  • Why it's important to worry about Node.js architecture
  • Why arguments are preferable to globals
  • How environment variables can be useful

We also looked at issues around request-scoped dependencies, before sharing some metrics to assess third-party Node.js libraries.

If I were to draw a single thread that runs through this article, it would be "code thoughtfully". It's easy to get excited and rush into a new project. The thrill of getting something to work for the first time often outweighs the fear of its future cost. And we're developers — we're told to move fast and break things.

I simply argue that a few moments of thought at the start of a project can save days (or weeks) of pain once it's in motion.

Consider the testability, maintainability, and security of your code where it matters. These decisions will pay dividends down the road.

Happy coding!

P.S. If you liked this post, subscribe to our JavaScript Sorcery list for a monthly deep dive into more magical JavaScript tips and tricks.

P.P.S. If you need an APM for your Node.js app, go and check out the AppSignal APM for Node.js.

Nate Anderson

Nate Anderson

Our guest author Nate Anderson is a software engineer at LTK. He believes the key to great technology is a close relationship between product and engineering, and is excited about tech like Go, TypeScript, GraphQL, and serverless computing.

All articles by Nate Anderson

Become our next author!

Find out more

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps