Speech Synthesis on the Raspberry Pi Using Node.js

Taron Foxworth
Taron Foxworth | 6 minute read

In the past, the easiest way to add speech synthesis to your application was to install Festival, an open-source speech synthesis engine, but the voices would sound too computery. Festival was not very helpful if you needed a lifelike voice.

At the last Re: Invent conference, Amazon released Polly. The Polly service was meant to be a low-cost, simple way to add speech synthesis to your applications. Since Polly is an Amazon service, you get all the benefits of their AI engine, and, for convenience, you can save audio for offline playback. Best of all, it's very inexpensive:

You can use Pollyto process 5 million characters per month at no charge. After that, you pay $0.000004 per character, or about $0.004 per minute of generated audio. That works out to about $0.018 for this blog post, or around $2.40 for the full text of Adventures of Huckleberry Finn.

Amazon Polly – Text to Speech in 47 Voices and 24 Languages

With a generous free tier, it's perfect for all kinds of projects.

Not only will this demo show you how to use Polly for speech synthesis in your app, the speech itself will be dynamic — you will be able to have it say anything. It's all powered by Losant and Airtable. To demo this, we are building an office Inspiration Station 🤗

The Inspiration Station is meant to give someone a little bit of inspiration when they need it in the Losant office. Once you press the big red button, a quote from Gary Vaynerchuk will be spoken to you for that mid-day startup boost you need. 

There are a lot of pieces involved, but here is how it works:

  • The red button is wired to the GPIO on the Raspberry Pi.
  • The Raspberry Pi has a Node.js app listening for the button press; the app uses Johnny-Five.
  • Once the button is pressed, it triggers a Losant workflow. This workflow will grab our quotes from Airtable and send a random one back to the Pi.
  • Then, using Amazon Polly, our Pi will say the quote via the connected speaker.

Now that you get all the high-level pieces, let's put it all together:

How It Works

Materials:

Putting Together a List of Quotes

I wanted an easy way to have a list of quotes that is accessible via an API. Airtable is perfect for this. It's a spreadsheet that's backed by a CRUD API. It's a useful tool to create a spreadsheet-like database. 

Here is the table I put together:

After you create your table of quotes in Airtable, their API documentation will provide you the CRUD routes for your spreadsheet database.

Setting Up the Raspberry Pi

In this project, I used the Raspberry Pi 3 because it already has WiFi on board. But the other versions of the Pi should work just fine.

raspberry_pi_b_2_0_0.jpg

If you're familiar with the Pi, this should be straightforward. You will need to have your Pi set up and connected to the internet, which you can do via command line or GUI.

Next, you will need to install the latest version of Node.js. At the time of this article, the current version is 7.x. A command line tool called nvm is the recommended way to install Node.js.

Setting Up the Button

This particular button took some soldering. Any button would work I just wanted the big, red fancy button.

On to the button I had, I soldered two jump wires onto the two leads of the button.

button_bottom.jpg

Then plugged that into the Raspberry Pi, like so:

button_plugged_in.jpg

To make the circuit simple, I configured the Pi’s GPIO to be a pull-up resistor. Nothing has to be done physically to make this change; it's done in the code, which is explained in a later section.

Setup Losant

We need Losant to capture the button press, and a send a random @garyvee quote from Airtable to the RasberryPi.

First, you'll need to create a Losant account and a new application.

Create Device

Now, let's create a new device in Losant. This device will represent the Raspberry Pi. Let's also give it a boolean device attribute called button. This allows us to track when the button is pressed, and respond in a workflow.

add_device.png

Make sure you also create an access key/secret pair. We will use it to later authenticate the device.

Create Workflow

With workflows, we can describe what happens when the button is pressed. In this case, we want to send a command back to the device that contains speech text. Here is the workflow we will be creating:

workflow.png

This workflow starts with two triggers: Virtual Button and Device. The Virtual Button allows us to trigger the workflow manually (this is helpful for testing). The Device node will trigger this workflow whenever the Raspberry Pi reports state.

HTTP Node

With the HTTP node, we can make a request to Airtable to retrieve the quotes we set up earlier.

http_node.png

Go to Airtable's API documentation, and grab your API URL and auth token. Then, add them to the HTTP node settings like you see in the screenshot above.

You will also need to tell the HTTP node that you want the response of the API request to be added to the payload. In this case, I'm adding the response to data.airtable.

http_payload.png

Function Node

Now that we have a list of quotes, we need to pick a random one to return. I used the Function node for this.

function_node.png

In the Function node, we can input raw Javascript. This allows for a convenient way to interact with the payload. Here is how we can grab a random quote:

var quoteIndex = getRandomInt(0, (payload.data.airtable.body.records.length - 1 ))

payload.data.quote = payload.data.airtable.body.records[quoteIndex].fields.text // add the quote back to the payload

// get random number between min & max
function getRandomInt(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}

Device Command

Lastly, we can use the Device Command to send a command back to the Raspberry Pi.

device_command.png

When configuring the command, we want to send the command speak, and the following response data:

{ "text": "{{ data.quote }}", "voiceId": "Joanna" }

Since we set payload.data.quote in the Function node, we can use that here to populate in a template

Configure AWS Keys

To use Amazon Polly, you will need an AWS account. (Remember, it is free to process 5 million characters at no charge). To access our account from the Pi, we need to configure an AWS key/secret pair within our AWS account. If you don’t have them, take a look at this documentation. Once obtained, create a file on the Raspberry Pi at ~/.aws/credentials with this content:

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

The “credentials" file is what the Node.js aws-sdk module uses by default to connect to your AWS account. Overall, be careful with these keys; they could do damage in the wrong hands.

Download Github Project

SSH into your Rasberry Pi, and pull down the project code:

git clone git@github.com:Losant/inspiration-station.git

Code Walkthrough

There is only one main file in this project. It's called inspiration-station.js. Let's go through the major pieces.

First, we can look at controlling the GPIO:

Here we are initializing Johnny Five, which is a robotics library to easily control microcontrollers and system on chips, like the Arduino and Raspberry Pi respectively.

We are creating a board, using the Raspberry Pi plugin, to listen to the GPIO on the Raspberry Pi.

Now, let's take a look at how the speaker works:

The speaker Node.js module makes it easy to stream audio to your speaker. The module uses streams. If you are not familiar with streams, @substack's Stream Handbook is a helpful resource.

We are also creating a Readable stream so we can push an audio buffer ( we'll get this from Amazon Polly) to it later.

On to Amazon Polly:

Now, we can initialize the aws-sdk to interact with Amazon Polly. I'm also creating a function called speak that will accept text and voiceId — because you can configure multiple voices in Amazon Polly. We can call speak when we get a device command from Losant.

Notice in the speak function, we push the audio stream to the readable stream we created eariler:

AudioStream.push(data.AudioStream)

Finally, connect it all to Losant:

We can configure our Pi to receive and send commands to Losant. Once we get a command, we can pass the payload data into the speak function we defined earlier. 

It's important you update device-id, access-key, and access-secret with the proper values you have in Losant. 

Starting The Station

To start the app, run this command:

sudo node inspiration-station.js

Now, if you press the button, it should follow the steps we described above:

  • Once the button is pressed, it triggers a Losant workflow. This workflow will grab our quotes from Airtable and sends a random one back down to our device. 
  • Then, using Amazon Polly, our device will then say the quote via the connected speaker 

All Done

You can never have too much ❤️ in the office. Let me know your feedback in the comments or on Twitter

Until next time, Stay Connected! 🦊

Tagged