Learn how to build a speech-to-text application in JavaScript using the Web Speech Recognition API. This guide will walk you through each step to implement a speech-to-text feature that can convert spoken words into written text, enhancing accessibility and capturing information easily.
Getting Started
Building the User Interface
- Create a file named 'speech.html'.
- Add the following HTML code to design the UI for the speech-to-text app:
<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta charset="utf-8">
<title>speech to text in javascript</title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@4.6.0/dist/css/bootstrap.min.css">
</head>
<body>
<div class="container">
<h1 class="text-center mt-5">
Speech to Text in JavaScript
</h1>
<div class="form-group">
<textarea id="textbox" rows="6" class="form-control"></textarea>
</div>
<div class="form-group">
<button id="start-btn" class="btn btn-danger btn-block">
Start
</button>
<p id="instructions">Press the Start button</p>
</div>
</div>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script src="script.js"></script>
</body>
</html>
In the code above, We used HTML and Bootstrap for a simple, responsive layout. The Start button triggers speech recognition, and the textarea
displays the transcribed text.
Your UI should look like this:
Writing the JavaScript Code
Create a file named script.js
and paste the following code inside:
var speechRecognition = window.webkitSpeechRecognition
var recognition = new speechRecognition()
var textbox = $("#textbox")
var instructions = $("#instructions")
var content = ''
recognition.continuous = true
// recognition is started
recognition.onstart = function() {
instructions.text("Voice Recognition is On")
}
recognition.onspeechend = function() {
instructions.text("No Activity")
}
recognition.onerror = function() {
instruction.text("Try Again")
}
recognition.onresult = function(event) {
var current = event.resultIndex;
var transcript = event.results[current][0].transcript
content += transcript
textbox.val(content)
}
$("#start-btn").click(function(event) {
recognition.start()
})
textbox.on('input', function() {
content = $(this).val()
})
In the code above, we invoked the Web Speech Recognition
API and initialized an instance stored in the recognition
variable.
After this, we made references to our #textbox
and #instructions
elements we defined in the HTML using JQuery to control them from our code.
We also created a content
variable that keeps track of text the application has converted and displayed in the textarea
from the HTML file. We are initializing it to an empty string because we have not converted anything yet.
We then set the continuous variable of the recognition
object to true
. Thus, we are making the API continuously listen for input from the user’s microphone.
We created an event handler triggered whenever the user clicks on the Start
button to start recognizing. When this happens, the recognition API is begun and will listen for input from the user.
When you press the button, your browser will request permission to use your microphone, as shown in the image below.
We also added a couple of event handlers to the recognition
object to bring our application to life. They are onstart, onspeechend, onerror, and onresult.
The onstart
event handler is triggered when the recognition API starts and has microphone access. Here, we programmed our application to inform the user that voice recognition is on and converts speech to text.
Next, we will write code for the onresult
event handler. This event is triggered when the recognition API has successfully converted speech from the user’s microphone to text, and the data is made available via the event.results
variable.
In this function, we will fetch the transcript of the speech given to us by the event.results
variable, then update our previous content
variable and textarea
with the new results.
Now, the application is complete. If you click the Start
button, you will see that it will automatically convert whatever you speak into text and fill in the transcribed text inside the textbox.
We also created the onerror
event handler triggered when an error occurs while transcribing the speech. If any error occurs during this process, our application will inform the user via the instruction box.
We also created the onspeechend
event handler triggered when there is no input from the microphone, and the application is in an idle state. When this happens, our application will inform the user via the instruction box.
Conclusion
In this article, you learned how to build a JavaScript-based speech-to-text application using the Web Speech Recognition API. You built a simple UI, connected event handlers, and used the API to convert speech into text in real time. This project demonstrates how JavaScript, HTML, and CSS can work together to create accessible, interactive web applications.
Top comments (0)