Making a Podcast Transcription Server with Express.js

Ian Lavery
Picovoice
Published in
4 min readMay 19, 2022

--

I recently had this idea to create a server that transcribes content from an audio RSS feed. I’m a podcast lover (since at least 2006!) and on a few of my favourite, long-running shows they’ll frequently mention if a topic has come up on earlier shows, followed with “I wish we could search for this!”. In that moment, I can hear a late-night infomercial exclaiming “there’s got to be a better way!”. Of course, searching audio is a difficult task, but searching text is trivial. If we could just have a text-based equivalent for each podcast RSS, we could index it, do topic analysis on it, create summaries, etc.

These days, every great idea needs great software behind it. I was also looking to learn Express.js, a popular framework for Node.js web apps. I chose this as my web framework because of its minimalist, performance-focused design. Also, all the cool kids are doing it.

Now, on to the transcription software. Picovoice recently released the Leopard Speech-to-Text engine which offers fully on-device audio transcription. Leopard will allow us to download new items from the RSS feed directly to our transcription server and process them there — eliminating any bandwidth and cloud costs associated with sending the audio to a cloud provider for processing.

With the tech sorted out, we can get to coding!

Setting up a Server with Express.js

Our general backend setup is going to be very simple. A single endpoint for transcription at /rss-transcribe that will accept an RSS URL and a general index page for testing:

The test page will allow me to manually enter a podcast RSS feed URL and it will be sent to the /rss-transcribe backend for processing.

Parsing the RSS Feed

For those who don’t know an RSS feed is essentially just an XML file that’s hosted and updated periodically, usually representing some feed of context. For podcasts, each node contains metadata about the podcast and a link to the audio file. When our server gets the URL to the RSS feed it will need to fetch the XML file and then parse it in order to get the podcast episode entries. Luckily, there’s an npm for that!

For parsing, we’ll use a great RSS parser package from Robert Brennan called rss-parser. It’s going to take the URL and provide us with a JSON representation of the feed:

Fetching the Audio

Once I had the JSON of the RSS, I was able to find where the podcast audio link was located in the object. I tried a few different feeds and, in each of them, enclosure.url contained the link to the audio file. Initially, I tried a simple HTTP get on the URL, but I found that podcast audio links often redirect you multiple times, which can be difficult to handle. With the well-known request package axios used in place of HTTP get, this became a non-issue — I was able to get the audio file and write the mp3 to a temporary file for processing:

Transcribing the Podcast

Once we have the audio file, we can feed it to Picovoice’s Speech-to-Text engine, Leopard. In order to use Leopard, we need to sign up for a free Picovoice Console account. Once signed in, we can get our AccessKey which will grant us free audio transcription as well!

Once we’ve copied our AccessKey, it’ll take only a couple of lines of code to get a transcription from our file, which we’ll send back in our response:

Where do we go from here?

Currently, in my test setup, I have the transcription being sent in the response, which I then write to a text file in the browser for download. This allowed me to try out the system end-to-end as a user, but it’s not exactly my dream of auto-transcribing an RSS feed and making a podcast searchable. So what pieces are missing?

For auto-transcribing an RSS feed, we’d need some sort of trigger that makes a call to our endpoint when a new podcast episode is posted. There’s some great tech for automating workflow like this. Zapier, for instance, has a trigger type that does exactly this! We could also implement our own trigger or just poll an RSS feed for updates.

For making it searchable, we just need to store and index the transcriptions we generate. Once we have that in place, we’d just need a front end with a search bar and a new endpoint for queries.

Something to work on for next time! In the meantime, check out the source code and feel free to add your own spin on it. For more information regarding Picovoice SDKs and products, visit the website, docs or explore the GitHub repositories. If your project requires custom wake word or context models, sign up for the Picovoice Console.

--

--

Ian Lavery
Picovoice

Polyglot Programmer and Multimedia Artist