Hey everyone!
Ever since I started programming, for some reason, I always thought it would be so cool to program my very own voice assistant. As it turns out, it's not that hard, and I'll show you how to very easily create one!
Disclaimer: the browser compatibility for this project has only been tested on Chrome, so there may be some compatibility issues on other browsers and mobile devices.
Okay, so first, let's start with a basic setup of our project. Let's create 3 files, index.html
, style.css
, and script.js
. If you're using Replit for this project, which I highly recommend, these three files should already be created with the HTML/CSS/JS template.
The style.css
and script.js
file should be empty for now, but put this HTML snippet in the HTML file if it's not there already:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width">
<title>Voice Assistant</title>
<link href="style.css" rel="stylesheet" type="text/css" />
</head>
<body>
<script src="script.js"></script>
</body>
</html>
Next, let's setup the frontend elements we need for this voice assistant. Since a voice assistant is mainly backend JS, we won't need much. We'll only need 3 elements:
- A button for the user to click to have the voice assistant start listening, with an id of "listen-button." When the user clicks on the button, we will call the function
listen()
, which we have not defined yet, but I'll talk about that later. - An input area to display the speech-to-text text that we are speaking, with an id of "input-area"
- An output area to display the result of the voice assistant, with an id of "output-area"
We'll put all 3 of these elements inside a div, and the finished HTML file should look like this:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width">
<title>Voice Assistant</title>
<link href="style.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="main-container">
<p id="input-area">...</p>
<p id="output-area">...</p>
<button id="listen-button" onclick="listen()">Listen</button>
</div>
<script src="script.js"></script>
</body>
</html>
Since the items are a little jumbled together with no styling, let's just put this basic piece of the code in the CSS file:
#main-container {
text-align: center;
border: 1px solid black;
padding: 1em;
}
This should be your result so far:
I get that the page still looks trashy without proper CSS styling, but I'm not going to get into that in this tutorial, I'm sure there are plenty of CSS tutorials out there if you would like to make your voice assistant look better.
Now that the HTML is out of the way, let's get into the fun stuff: actually making the voice assistant work.
The first part of the voice assistant that we need is some way to get the computer to listen to us, receive microphone input, then turn that speech into text. This would normally be very complicated, but thankfully, we have an API (Application Programming Interface) that can do this very easily for us, called the Web Speech API.
So, to use this, let's first create a function in the script.js
file, that we can call listen()
. We'll call this function when the user clicks the Listen
button that we created earlier in the HTML.
function listen() {
}
Inside of that function, we'll create an easy way to access our HTML elements:
function listen() {
let inputArea = document.getElementById('input-area')
let outputArea = document.getElementById('output-area')
}
And setup our speech recogntion:
function listen() {
let inputArea = document.getElementById('input-area')
let outputArea = document.getElementById('output-area')
var recognition = new webkitSpeechRecognition();
recognition.lang = "en-GB";
recognition.start();
}
Then, we will check for a result, and when the recognition gets a result, we'll store that data inside a variable called transcript
and then display that data to the inputArea
that we created in the HTML.
Here's what that would look like:
function listen() {
let inputArea = document.getElementById('input-area')
let outputArea = document.getElementById('output-area')
var recognition = new webkitSpeechRecognition();
recognition.lang = "en-GB";
recognition.start();
recognition.onresult = function(event) {
let transcript = event.results[0][0].transcript;
inputArea.innerHTML = event.results[0][0].transcript;
}
}
Now, let's run this program and see what happens. But please note: the program will not run properly in an iframe or something other that's not a browser window. The API needs to access your microphone through the browser, so please open it in a new tab.
Okay, so here's what should happen if you did everything correctly:
If you open project in a new tab and click the "Listen" button, you should get this prompt:
Click "Allow," and then try speaking! Say "Hello!"
The program should display the result like so:
Awesome! The program can show what we're saying on the screen! However, this is only half of the voice assistant. The voice assistant should take the information of what we said and then do something: reply to us, give us information, etc.
This is very easy to add! Since we have the text stored in the transcript
variable, let's just create a simple if statement, let's say, to check if we said "hello," like this:
if (transcript == "hello") {
outputArea.innerHTML = "Hello, User!"
}
Now, we can place that code right here, in the the recognition.onresult
function:
recognition.onresult = function(event) {
let transcript = event.results[0][0].transcript;
if (transcript == "hello") {
outputArea.innerHTML = "Hello, User!"
}
inputArea.innerHTML = event.results[0][0].transcript;
}
So, now if we say "hello," the program should output "Hello, User!"
This is great, but what if someone said, "Hello voice assistant," or something that included the word "hello"? Our voice assistant wouldn't understand, becuase it's only programmed to respond if the user says only "hello." However, JavaScript has a handy function called includes()
that can check if a variable includes something. Thus, instead, we can do this:
if (transcript.includes("hello")) {
outputArea.innerHTML = "Hello, User!"
}
Now, if the user says something that includes the word "hello," the program will output "Hello, User!" Great, right?
Now, with what we've learned so far, let's create two more conditionals: one to give us the weather, and another one to alert us if the program doesn't know what we're trying to say, because currently, the program just does nothing if it doesn't know what we're saying.
For the weather conditional, we'll use an else if
statement below the if
statement, to open a weather website if the user wants the weather. We can do that like so:
if (transcript.includes("hello")) {
outputArea.innerHTML = "Hello, User!"
} else if (transcript.includes("weather")) {
window.open("https://www.google.com/search?q=weather")
} else {
outputArea.innerHTML = "I don't know what you mean."
}
This voice assistant is really coming along! However, I'm going to end the tutorial here. There's still a lot of things you can do, though. Here's a list of features you can add!
- Add more conditionals so that the voice assistant can do more!
- Add better CSS styling!
- Add randomized responses, by storing an array of responses, and getting a random element from them (https://stackoverflow.com/questions/4550505/getting-a-random-value-from-a-javascript-array)
- Turn this into a real voice assistant by having the program respond with a synthesis voice with another API, like this one: https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis
Thanks for reading this tutorial, and I hope you learned something! Happy Coding!!
Top comments (2)
This only works on Chrome, Edge, and Safari.
Opera, IE, and Firefox do not support this API
Yes, here is the full list for browser compatibility:
dev-to-uploads.s3.amazonaws.com/up...