Working with the file system on Node.js

[2022-06-28] dev, javascript, nodejs
(Ad, please don’t block)
Warning: This blog post is outdated. Instead, read chapter “Working with the file system on Node.js” in “Shell scripting with Node.js”.

This blog post contains:

  • An overview of the different parts of Node’s file system APIs.
  • Recipes (code snippets) for performing various tasks via those APIs.

The focus of this post is on shell scripting, which is why we only work with textual data.

Concepts, patterns and conventions of Node’s file system APIs  

Ways of accessing files  

  1. We can read or write the whole content of a file via a string.
  2. We can open a stream for reading or a stream for writing and process a file in smaller pieces, one at a time. Streams only allow sequential access.
  3. We can use file descriptors or FileHandles and get both sequential and random access, via an API that is loosely similar to streams.
    • File descriptors are integer numbers that represent files. They are managed via these functions (only the synchronous names are shown, there are also callback-based versions – fs.open() etc.):
      • fs.openSync(path, flags?, mode?) opens a new file descriptor for a file at a given path and returns it.
      • fs.closeSync(fd) closes a file descriptor.
      • fs.fchmodSync(fd, mode)
      • fs.fchownSync(fd, uid, gid)
      • fs.fdatasyncSync(fd)
      • fs.fstatSync(fd, options?)
      • fs.fsyncSync(fd)
      • fs.ftruncateSync(fd, len?)
      • fs.futimesSync(fd, atime, mtime)
    • Only the synchronous API and the callback-based API use file descriptors. The Promise-based API has a better abstraction, class FileHandle, which is based on file descriptors. Instances are created via fsPromises.open(). Various operations are provided via methods (not via functions):
      • fileHandle.close()
      • fileHandle.chmod(mode)
      • fileHandle.chown(uid, gid)
      • Etc.

Note that we don’t use (3) in this blog post – (1) and (2) are enough for our purposes.

Function name prefixes  

Functions whose names start with an “l” usually operate on symbolic links:

  • fs.lchmodSync(), fs.lchmod(), fsPromises.lchmod()
  • fs.lchownSync(), fs.lchown(), fsPromises.lchown()
  • fs.lutimesSync(), fs.lutimes(), fsPromises.lutimes()
  • Etc.

Prefix “f”: file descriptors  

Functions whose names start with an “f” usually manage file descriptors:

  • fs.fchmodSync(), fs.fchmod()
  • fs.fchownSync(), fs.fchown()
  • fs.fstatSync(), fs.fstat()
  • Etc.

Important classes  

Several classes play important roles in Node’s file system APIs.

URLs: an alternative to file system paths in strings  

Whenever a Node.js function accepts a file system path in a string (line A), it usually also accepts an instance of URL (line B):

assert.equal(
  fs.readFileSync(
    '/tmp/text-file.txt', {encoding: 'utf-8'}), // (A)
  'Text content'
);
assert.equal(
  fs.readFileSync(
    new URL('file:///tmp/text-file.txt'), {encoding: 'utf-8'}), // (B)
  'Text content'
);

Manually converting between paths and file: URLs seems easy but has surprisingly many pitfalls: percent encoding or decoding, Windows drive letters, etc. Instead, it’s better to use the following two functions:

We don’t use file URLs in this blog post. In a future blog post, we’ll see use cases for them.

Buffers  

Class Buffer represents fixed-length byte sequences on Node.js. It is a subclass of Uint8Array (a TypedArray). Buffers are mostly used when working with binary files and therefore of less interest in this blog post.

Whenever Node.js accepts a Buffer, it also accepts a Uint8Array. Thus, given that Uint8Arrays are cross-platform and Buffers aren’t, the former is preferable.

Buffers can do one thing that Uint8Arrays can’t: encoding and decoding text in various encodings. If we need to encode or decode UTF-8 in Uint8Arrays, we can use class TextEncoder or class TextDecoder. These classes are available on most JavaScript platforms:

> new TextEncoder().encode('café')
Uint8Array.of(99, 97, 102, 195, 169)
> new TextDecoder().decode(Uint8Array.of(99, 97, 102, 195, 169))
'café'

Node.js streams  

Some functions accept or return native Node.js streams:

  • stream.Readable is Node’s class for readable streams. Module node:fs uses fs.ReadStream which is a subclass.
  • stream.Writable is Node’s class for writable streams. Module node:fs uses fs.WriteStream which is a subclass.

Instead of native streams, we can now use cross-platform web streams on Node.js. The blog post “Using web streams on Node.js” explains how.

Reading and writing files  

Reading a file synchronously into a single string (optional: splitting into lines)  

fs.readFileSync(filePath, options?) reads the file at filePath into a single string:

assert.equal(
  fs.readFileSync('text-file.txt', {encoding: 'utf-8'}),
  'there\r\nare\nmultiple\nlines'
);

Pros and cons of this approach (vs. using a stream):

  • Pro: Easy to use and synchronous. Good enough for many use cases.
  • Con: Not a good choice for large files.
    • Before we can process the data, we have to read it in its entirety.

Next, we’ll look into spliting the string we have read into lines.

Splitting lines without including line terminators  

The following code splits a string into lines while removing line terminators. It works with Unix and Windows line terminators:

const RE_SPLIT_EOL = /\r?\n/;
function splitLines(str) {
  return str.split(RE_SPLIT_EOL);
}
assert.deepEqual(
  splitLines('there\r\nare\nmultiple\nlines'),
  ['there', 'are', 'multiple', 'lines']
);

“EOL” stands for “end of line”. We accept both Unix line terminators ('\n') and Windows line terminators ('\r\n', like the first one in the previous example). For more information, see section “Handling line terminators across platforms”.

Splitting lines while including line terminators  

The following code splits a string into lines while including line terminators. It works with Unix and Windows line terminators (“EOL” stands for “end of line”):

const RE_SPLIT_AFTER_EOL = /(?<=\r?\n)/; // (A)
function splitLinesWithEols(str) {
  return str.split(RE_SPLIT_AFTER_EOL);
}

assert.deepEqual(
  splitLinesWithEols('there\r\nare\nmultiple\nlines'),
  ['there\r\n', 'are\n', 'multiple\n', 'lines']
);
assert.deepEqual(
  splitLinesWithEols('first\n\nthird'),
  ['first\n', '\n', 'third']
);
assert.deepEqual(
  splitLinesWithEols('EOL at the end\n'),
  ['EOL at the end\n']
);
assert.deepEqual(
  splitLinesWithEols(''),
  ['']
);

Line A contains a regular expression with a lookbehind assertion. It matches at locations that are preceded by a match for the pattern \r?\n but it doesn’t capture anything. Therefore, it doesn’t remove anything between the string fragments that the input string is split into.

On engines that don’t support lookbehind assertions (see this table), we can use the following solution:

function splitLinesWithEols(str) {
  if (str.length === 0) return [''];
  const lines = [];
  let prevEnd = 0;
  while (prevEnd < str.length) {
    // Searching for '\n' means we’ll also find '\r\n'
    const newlineIndex = str.indexOf('\n', prevEnd);
    // If there is a newline, it’s included in the line
    const end = newlineIndex < 0 ? str.length : newlineIndex+1;
    lines.push(str.slice(prevEnd, end));
    prevEnd = end;
  }
  return lines;
}

This solution is simple, but more verbose.

In both versions of splitLinesWithEols(), we again accept both Unix line terminators ('\n') and Windows line terminators ('\r\n'). For more information, see section “Handling line terminators across platforms”.

Reading a file via a stream, line by line  

We can also read text files via streams:

import {Readable} from 'node:stream';

const nodeReadable = fs.createReadStream(
  'text-file.txt', {encoding: 'utf-8'});
const webReadableStream = Readable.toWeb(nodeReadable);
const lineStream = webReadableStream.pipeThrough(
  new ChunksToLinesStream());
for await (const line of lineStream) {
  console.log(line);
}

// Output:
// 'there\r\n'
// 'are\n'
// 'multiple\n'
// 'lines'

We used the following external functionality:

Web streams are asynchronously iterable, which is why we can use a for-await-of loop to iterate over lines.

If we are not interested in text lines, then we don’t need ChunksToLinesStream, can iterate over webReadableStream and get chunks with arbitrary lengths.

More information:

Pros and cons of this approach (vs. reading a single string):

  • Pro: Works well with large files.
    • We can process the data incrementally, in smaller pieces and don’t have to wait for everything to be read.
  • Con: More complicated to use and not synchronous.

Writing a single string to a file synchronously  

fs.writeFileSync(filePath, str, options?) writes str to a file at filePath. If a file already exists at that path, it is overwritten.

The following code shows how to use this function:

fs.writeFileSync(
  'new-file.txt',
  'First line\nSecond line\n',
  {encoding: 'utf-8'}
);

For information on line terminators, see section “Handling line terminators across platforms”.

Pros and cons (vs. using a stream):

  • Pro: Easy to use and synchronous. Works for many use cases.
  • Con: Not suited for large files.

Appending a single string to a file (synchronously)  

The following code appends a line of text to an existing file:

fs.appendFileSync(
  'existing-file.txt',
  'Appended line\n',
  {encoding: 'utf-8'}
);

We can also use fs.writeFileSync() to perform this task:

fs.writeFileSync(
  'existing-file.txt',
  'Appended line\n',
  {encoding: 'utf-8', flag: 'a'}
);

This code is almost the same as the one we used to overwrite existing content (see the previous section for more information). The only difference is that we added the option .flag: The value 'a' means that we append data. Other possible values (e.g. to throw an error if a file doesn’t exist yet) are explained in the Node.js documentation.

Watch out: In some functions, this option is named .flag, in others .flags.

Writing multiple strings to a file via stream  

The following code uses a stream to write multiple strings to a file:

import {Writable} from 'node:stream';

const nodeWritable = fs.createWriteStream(
  'new-file.txt', {encoding: 'utf-8'});
const webWritableStream = Writable.toWeb(nodeWritable);

const writer = webWritableStream.getWriter();
try {
  await writer.write('First line\n');
  await writer.write('Second line\n');
  await writer.close();
} finally {
  writer.releaseLock()
}

We used the following functions:

More information:

Pros and cons (vs. writing a single string):

  • Pro: Works well with large files because we can write the data incrementally, in smaller pieces.
  • Con: More complicated to use and not synchronous.

Appending multiple strings to a file via a stream (asynchronously)  

The following code uses a stream to append text to an existing file:

import {Writable} from 'node:stream';

const nodeWritable = fs.createWriteStream(
  'existing-file.txt', {encoding: 'utf-8', flags: 'a'});
const webWritableStream = Writable.toWeb(nodeWritable);

const writer = webWritableStream.getWriter();
try {
  await writer.write('First appended line\n');
  await writer.write('Second appended line\n');
  await writer.close();
} finally {
  writer.releaseLock()
}

This code is almost the same as the one we used to overwrite existing content (see the previous section for more information). The only difference is that we added the option .flags: The value 'a' means that we append data. Other possible values (e.g. to throw an error if a file doesn’t exist yet) are explained in the Node.js documentation.

Watch out: In some functions, this option is named .flag, in others .flags.

Handling line terminators across platforms  

Alas, not all platform have the same line terminator characters that mark the end of line (EOL):

  • On Windows, EOL is '\r\n'.
  • On Unix (incl. macOS), EOL is '\n'.

To handle EOL in a manner that works on all platforms, we can use several strategies.

Reading line terminators  

When reading text, it’s best to recognize both EOLs.

What might that look like when splitting a text into lines? We can include the EOLs (in either format) at the ends. That enables us to change as little as possible if we modify those lines and write them to a file.

When processing lines with EOLs, it’s sometimes useful to remove them – e.g. via the following function:

const RE_EOL_REMOVE = /\r?\n$/;
function removeEol(line) {
  const match = RE_EOL_REMOVE.exec(line);
  if (!match) return line;
  return line.slice(0, match.index);
}

assert.equal(
  removeEol('Windows EOL\r\n'),
  'Windows EOL'
);
assert.equal(
  removeEol('Unix EOL\n'),
  'Unix EOL'
);
assert.equal(
  removeEol('No EOL'),
  'No EOL'
);

Writing line terminators  

When it comes to writing line terminators, we have two options:

  • Constant EOL in module 'node:os' contains the EOL of the current platform.
  • We can detect the EOL format of an input file and use that when we change that file.

Traversing and creating directories  

Traversing a directory  

The following function traverses a directory and lists all of its descendants (its children, the children of its children, etc.):

import * as path from 'node:path';

function* traverseDirectory(dirPath) {
  const dirEntries = fs.readdirSync(dirPath, {withFileTypes: true});
  // Sort the entries to keep things more deterministic
  dirEntries.sort(
    (a, b) => a.name.localeCompare(b.name, 'en')
  );
  for (const dirEntry of dirEntries) {
    const fileName = dirEntry.name;
    const pathName = path.join(dirPath, fileName);
    yield pathName;
    if (dirEntry.isDirectory()) {
      yield* traverseDirectory(pathName);
    }
  }
}

We used this functionality:

  • fs.readdirSync(thePath, options?) returns the children of the directory at thePath.
    • If option .withFileTypes is true, the function returns directory entries, instances of fs.Dirent. These have properties such as:
      • dirent.name
      • dirent.isDirectory()
      • dirent.isFile()
      • dirent.isSymbolicLink()
    • If option .withFileTypes is false or missing, the function returns strings with file names.

The following code shows traverseDirectory() in action:

for (const filePath of traverseDirectory('dir')) {
  console.log(filePath);
}

// Output:
// 'dir/dir-file.txt'
// 'dir/subdir'
// 'dir/subdir/subdir-file1.txt'
// 'dir/subdir/subdir-file2.csv'

Creating a directory (mkdir, mkdir -p)  

We can use the following function to create directories:

fs.mkdirSync(thePath, options?): undefined | string

options.recursive determines how the function creates the directory at thePath:

  • If .recursive is missing or false, mkdirSync() returns undefined and an exception is thrown if:

    • A directory (or file) already exists at thePath.
    • The parent directory of thePath does not exist.
  • If .recursive is true:

    • It’s OK if there is already a directory at thePath.
    • The ancestor directories of thePath are created as needed.
    • mkdirSync() returns the path of the first newly created directory.

This is mkdirSync() in action:

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
  ]
);
fs.mkdirSync('dir/sub/subsub', {recursive: true});
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/sub',
    'dir/sub/subsub',
  ]
);

Function traverseDirectory(dirPath) lists all descendants of the directory at dirPath.

Ensuring that a parent directory exists  

If we want to set up a nested file structure on demand, we can’t always be sure that the ancestor directories exist when we create a new file. Then the following function helps:

import * as path from 'node:path';

function ensureParentDirectory(filePath) {
  const parentDir = path.dirname(filePath);
  if (!fs.existsSync(parentDir)) {
    fs.mkdirSync(parentDir, {recursive: true});
  }
}

Here we can see ensureParentDirectory() in action (line A):

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
  ]
);
const filePath = 'dir/sub/subsub/new-file.txt';
ensureParentDirectory(filePath); // (A)
fs.writeFileSync(filePath, 'content', {encoding: 'utf-8'});
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/sub',
    'dir/sub/subsub',
    'dir/sub/subsub/new-file.txt',
  ]
);

Creating a temporary directory  

fs.mkdtempSync(pathPrefix, options?) creates a temporary directory: It appends 6 random characters to pathPrefix, creates a directory at the new path and returns that path.

pathPrefix shouldn’t end with a capital “X” because some platforms replace trailing Xs with random characters.

If we want to create our temporary directory inside an operating-system-specific global temporary directory, we can use function os.tmpdir():

import * as os from 'node:os';
import * as path from 'node:path';

const pathPrefix = path.resolve(os.tmpdir(), 'my-app');
  // e.g. '/var/folders/ph/sz0384m11vxf/T/my-app'

const tmpPath = fs.mkdtempSync(pathPrefix);
  // e.g. '/var/folders/ph/sz0384m11vxf/T/my-app1QXOXP'

It’s important to note that temporary directories are not automatically removed when a Node.js script terminates. We either have to delete it ourselves or rely on the operating system to periodically clean up its global temporary directory (which it may or may not do).

Copying, renaming, moving files or directories  

Copying files or directories  

fs.cpSync(srcPath, destPath, options?): copies a file or directory from srcPath to destPath. Interesting options:

  • .recursive (default: false): Directories (including empty ones) are only copied if this option is true.
  • .force (default: true): If true, existing files are overwritten. If false, existing files are preserved.
    • In the latter case, setting .errorOnExist to true leads to errors being thrown if file paths clash.
  • .filter is a function that lets us control which files are copied.
  • .preserveTimestamps (default: false): If true, the copies in destPath get the same timestamps as the originals in srcPath.

This is the function in action:

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir-orig',
    'dir-orig/some-file.txt',
  ]
);
fs.cpSync('dir-orig', 'dir-copy', {recursive: true});
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir-copy',
    'dir-copy/some-file.txt',
    'dir-orig',
    'dir-orig/some-file.txt',
  ]
);

Function traverseDirectory(dirPath) lists all descendants of the directory at dirPath.

Renaming or moving files or directories  

fs.renameSync(oldPath, newPath) renames or moves a file or a directory from oldPath to newPath.

Let’s use this function to rename a directory:

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'old-dir-name',
    'old-dir-name/some-file.txt',
  ]
);
fs.renameSync('old-dir-name', 'new-dir-name');
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'new-dir-name',
    'new-dir-name/some-file.txt',
  ]
);

Here we use the function to move a file:

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/subdir',
    'dir/subdir/some-file.txt',
  ]
);
fs.renameSync('dir/subdir/some-file.txt', 'some-file.txt');
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/subdir',
    'some-file.txt',
  ]
);

Function traverseDirectory(dirPath) lists all descendants of the directory at dirPath.

Removing files or directories  

Removing files and arbitrary directories (shell: rm, rm -r)  

fs.rmSync(thePath, options?) removes a file or directory at thePath. Interesting options:

  • .recursive (default: false): Directories (including empty ones) are only removed if this option is true.
  • .force (default: false): If false, an exception will be thrown if there is no file or directory at thePath.

Let’s use fs.rmSync() to remove a file:

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/some-file.txt',
  ]
);
fs.rmSync('dir/some-file.txt');
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
  ]
);

Here we use fs.rmSync() to recursively remove a non-empty directory.

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/subdir',
    'dir/subdir/some-file.txt',
  ]
);
fs.rmSync('dir/subdir', {recursive: true});
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
  ]
);

Function traverseDirectory(dirPath) lists all descendants of the directory at dirPath.

Removing an empty directory (shell: rmdir)  

fs.rmdirSync(thePath, options?) removes an empty directory (an exception is thrown if a directory isn’t empty).

The following code shows how this function works:

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/subdir',
  ]
);
fs.rmdirSync('dir/subdir');
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
  ]
);

Function traverseDirectory(dirPath) lists all descendants of the directory at dirPath.

Clearing directories  

A script that saves its output to a directory dir, often needs to clear dir before it starts: Remove every file in dir so that it is empty. The following function does that.

import * as path from 'node:path';

function clearDirectory(dirPath) {
  for (const fileName of fs.readdirSync(dirPath)) {
    const pathName = path.join(dirPath, fileName);
    fs.rmSync(pathName, {recursive: true});
  }
}

We used two file system functions:

This is an example of using clearDirectory():

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/dir-file.txt',
    'dir/subdir',
    'dir/subdir/subdir-file.txt'
  ]
);
clearDirectory('dir');
assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
  ]
);

Trashing files or directories  

The library trash moves files and folders to the trash. It works on macOS, Windows, and Linux (where support is limited and help is wanted). This is an example from its readme file:

import trash from 'trash';

await trash(['*.png', '!rainbow.png']);

trash() accepts either an Array of strings or a string as its first parameter. Any string can be a glob pattern (with asterisks and other meta-characters).

Reading and changing file system entries  

Checking if a file or directory exists  

fs.existsSync(thePath) returns true if a file or directory exists at thePath:

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/some-file.txt',
  ]
);
assert.equal(
  fs.existsSync('dir'), true
);
assert.equal(
  fs.existsSync('dir/some-file.txt'), true
);
assert.equal(
  fs.existsSync('dir/non-existent-file.txt'), false
);

Function traverseDirectory(dirPath) lists all descendants of the directory at dirPath.

Checking the stats of a file: Is it a directory? When was it created? Etc.  

fs.statSync(thePath, options?) returns an instance of fs.Stats with information on the file or directory at thePath.

Interesting options:

  • .throwIfNoEntry (default: true): What happens if there is no entity at path?
    • If this option is true, an exception is thrown.
    • If it is false, undefined is returned.
  • .bigint (default: false): If true, this function uses bigints for numeric values (such as timestamps, see below).

Properties of instances of fs.Stats:

  • What kind of file system entry is it?
    • stats.isFile()
    • stats.isDirectory()
    • stats.isSymbolicLink()
  • stats.size is the size in bytes
  • Timestamps:
    • There are three kinds of timestamps:
      • stats.atime: time of last access
      • stats.mtime: time of last modification
      • stats.birthtime: time of creation
    • Each of these timestamps can be specified with three different units – for example, atime:
      • stats.atime: instance of Date
      • stats.atimeMS: milliseconds since the POSIX Epoch
      • stats.atimeNs: nanoseconds since the POSIX Epoch (requires option .bigint)

In the following example, we use fs.statSync() to implement a function isDirectory():

function isDirectory(thePath) {
  const stats = fs.statSync(thePath, {throwIfNoEntry: false});
  return stats !== undefined && stats.isDirectory();
}

assert.deepEqual(
  Array.from(traverseDirectory('.')),
  [
    'dir',
    'dir/some-file.txt',
  ]
);

assert.equal(
  isDirectory('dir'), true
);
assert.equal(
  isDirectory('dir/some-file.txt'), false
);
assert.equal(
  isDirectory('non-existent-dir'), false
);

Function traverseDirectory(dirPath) lists all descendants of the directory at dirPath.

Changing file attributes: permissions, owner, group, timestamps  

Let’s briefly look at functions for changing file attributes:

Functions for working with hard links:

Functions for working with symbolic links:

The following functions operate on symbolic links without dereferencing them (note the name prefix “l”):

Other useful functions:

Options of functions that affect how symbolic links are handled:

  • fs.cpSync(src, dest, options?):
    • .dereference (default: false): If true, copy the files that symbolic links points to, not the symbolic links themselves.
    • .verbatimSymlinks (default: false): If false, the target of a copied symbolic link will be updated so that it still points to the same location. If true, the target won’t be changed.

Further reading