Upstart

Thoughts on building software and startups.

MOBYCAST
A weekly podcast about cloud native development, AWS, and building distributed systems.

Tags


Upstart

Node.js troubleshooting: Child process spawn output is (sometimes) empty

My Node.js application uses child_process.spawn() to invoke ffmpeg for inspecting audio and video files. This code had been running without fail for many months. How come it suddenly started sporadically returning empty results?


Recently, I was preparing for an important demo of Growl, the technology platform I'm working on for building podcast products and services. The demo shows off the platform's ability to perform high-speed import of podcast feeds, taking advantage of parallel processing while handling complicated signaling and locking semantics to bring it all back together.

This high-speed import feature has been in place for many, many months and performed extremely reliably. So, when doing a practice run before the demo to verify that all was working, I was dismayed to discover that there were a small number of errors happening when using ffmpeg to get metadata about the audio files. Running through the process again, I was still getting a small number of errors, albeit this time the errors were happening on a different set of audio files.

The code hadn't been changed recently, and no recent deploys had been made. What was going on?

The code in question

Here's the original version of the code that spawns ffmpeg (actually, ffprobe) to get metadata about audio and video files. It is wrapped in a Promise so the caller can use simple async/await syntax to invoke it.

NOTE: Certain parts of the original code have been omitted for brevity/readability.

Original buggy version: Spawning ffprobe and returning output

const FFPROBE_PATH = 'ffprobe';
const FFPROBE_ARGS = [ '-hide_banner', '-loglevel', 'fatal', '-print_format', 'json' ];

async function probe(filename) {

    const createPromise = new Promise((resolve, reject) => {
        const args = FFPROBE_ARGS.concat(filename);
        const proc = child_process.spawn(FFPROBE_PATH, args);

        const outputBuffers = [];

        proc.stdout.on('data', (data) => { 
            outputBuffers.push(data); 
        });

        proc.on('error', (err) => {
            reject(errnew Error(err.toString());
        });

        proc.on('exit', (code) => {
            const output = JSON.parse(outputBuffers.join(''));
            
            if (code !== 0) {
                const msg = `Failed with code = ${code}`;
                return reject(new Error(msg));
            }

            resolve(output);
        });
    });

    const props = await createPromise;

    return props;

}

The troubleshooting process

To start with, all I knew was that, rather suddenly, every once in a while when under load, JSON.parse() was throwing an exception because the input string to be parsed was empty.

My first thought was perhaps this was a memory issue. The system had a high degree of concurrency and was invoking many simultaneous child processes to run ffmpeg against the imported audio files. Everything was running within containers, with memory limits. However, after digging into the metrics, memory utilization wasn't an issue.

Next, I thought that maybe ffmpeg was intermittently exiting with an error. I updated the process's exit handler to first check the exit code value and throw an exception if not a successful result.

Updated version: Check for exit code before parsing output

    proc.on('exit', (code) => {
        if (code !== 0) {
            const msg = `Failed with code = ${code}`;
            return reject(new Error(msg));
        }

        const output = JSON.parse(outputBuffers.join(''));
        resolve(output);
    });

After re-testing, I discovered that the exit code from ffmpeg was always successful. So, there were no problems with spawning ffmpeg as a child process or with ffmpeg opening and reading the file.

It was now obvious that the most likely culprit was that stdout was not getting flushed by the time the code was trying to process the results.

RTFM

So, time to turn to the Node.js documentation. In particular I wanted to get the details on the various events emitted by the spawned child process.

Here's what the Node.js API documentation says about the exit event:

When the 'exit' event is triggered, child process stdio streams might still be open.

Whoops. That would explain why stdout was not getting flushed for some invocations.

Reading further, here's what the documentation says about the close event:

The 'close' event is emitted after a process has ended and the stdio streams of a child process have been closed. This is distinct from the 'exit' event since multiple processes might share the same stdio streams. The 'close' event will always emit after 'exit' was already emitted, or 'error' if the child failed to spawn.

So, given that the code was relying on capturing stdout to return results from the child process (ffmpeg), the code was hooking the wrong event. Instead of treating exit as the termination event, it should be listening for the close event to guarantee that the output streams have been closed.

Final correct version: Spawning ffprobe and returning output

const FFPROBE_PATH = 'ffprobe';
const FFPROBE_ARGS = [ '-hide_banner', '-loglevel', 'fatal', '-print_format', 'json' ];

async function probe(filename) {

    const createPromise = new Promise((resolve, reject) => {
        const args = FFPROBE_ARGS.concat(filename);
        const proc = child_process.spawn(FFPROBE_PATH, args);

        const outputBuffers = [];

        proc.stdout.on('data', (data) => { 
            outputBuffers.push(data); 
        });

        proc.on('close', (code) => {
            if (code !== 0) {
                const msg = `Failed with code = ${code}`;
                return reject(new Error(msg));
            }

            const output = JSON.parse(outputBuffers.join(''));
            resolve(output);
        });
    });

    const props = await createPromise;

    return props;

}

Lessons learned

What's surprising about this bug is that it was always there, right from the beginning. Just waiting for the perfect time to rear its ugly head, like right before an important demo. I suspect that I ran into this now because concurrency has increased significantly within the system, making timing issues more likely.

When this code was originally written, I was probably looking at sample code for how to wrap child_process.spawn() with a Promise. And that sample code hooked the 'exit' event, instead of the 'close' event.

Of course, the big lesson here is don't make assumptions. Don't blindly trust code snippets or examples. Make sure you understand exactly what the code is doing. Oh... and read the docs!

Entrepreneur, technologist and startup junkie. Founded two companies, one with $24M in VC, the other bootstrapped with a SBA loan. Now building startup #3, looking to shake up the podcasting industry.

View Comments