Alvison Hunter Arnuero | Front-End Web Developer

Posted on Nov 15, 2021 • Edited on Mar 25, 2023

Speech Recognition Fundamentals using Python

#python #webdev #programming #computerscience

Howdie, dear folks: How is it going today on that side of planet earth?

Have you ever wonder how to start using for Speech Recognition using the magic of the coding simplicity of Python? Would you like to tell the computer what to search online only with the voice within your own script? Well, if os, let’s build together our first “Hello World” using this amazing technology, shall we, folks?

First, as usual, we need to use some of those wonderful modules out there waiting for us to build awesome scripts. For Speech Recognition, one of the very first things that happens is precisely to recognize our voices, that is done by a popular class called Recognizer, that is present in most of the popular modules for these technologies.

when we create an instance of this class, it will take the task of recognizing speech. This or course has a several settings that needs to be configured in order to work more efficiently when recognizing speech from an audio source.

But, wait a minute, hold those horses, can ye? which package or module are we going to use in this article to get our hands dirty with the Speech Recognition feature once and for all? Well, let me tell ya in a second, dear pal.

Each instance of the recognizer class has seven methods for recognizing speech from an audio source using various APIs. These are:

Microsoft Bing Speech The famous Microsoft Bing Speech recognition engine.
Google Web Speech API This is the one we will use in this article.
Google Cloud Speech This tool requires installation of the google-cloud-speech package.
SoundHound Houndify Conversational Intelligence and Advanced independent voice AI platform.
IBM Speech Engine IBM Speech to Text Engine.
CMUSphinx OPEN SOURCE SPEECH RECOGNITION TOOLKIT, requires installing PocketSphinx.
WIT.AI The Natural Language Experiences Speech Engine.

Of these seven soldiers, only recognize_sphinx works offline with the CMU Sphinx engine. The other six all require an internet connection.

Ok pals, having that clarified, let’s proceed with the installation of the speech recognition module. let’s first create a folder that will contain our script along with all of these dependencies we are about to install. I called mine Speech Recognition but you look for the best name you can ever come with, okey?

Once we create the folder, let’s change directories and once with are in it, let us type the following command in order to install this magical module:

$ pip install SpeechRecognition

Besides that, since we will be using the microphone to dictate our queries in order for our script to look it up with the browser, we will need some other dependencies in order for this to work. Let’s install these additional packages:

PyAudio Package: The process for installing PyAudio will vary depending on your operating system. Ironically, the easiest installation for this will be with windows, which I rather find it a bit odd: no windows hater but I know right?

Debian Linux: If you’re on Debian-based Linux (like Ubuntu) you can install PyAudio with apt:

$ sudo apt-get install python-pyaudio python3-pyaudio

Once installed, you may still need to run pip install pyaudio, especially if you are working in a virtual environment.

macOS: For macOS, first you will need to install PortAudio with Homebrew, and then install PyAudio with pip:

$ brew install portaudio
$ pip install pyaudio

Windows OS: On Windows, you can install PyAudio with pip:

$ pip install pyaudio

Coding our Script!

Okey dokie, pals! That was that but now is time to put those bad boys to action, right? let’s build our code now.

Let’s open our favorite code editor and create a new file called sp_recog.py. Hopefully you are also using vsCode just like me on this side of the screen, lol.

Time to write the first lines of code. Let’s import that knight that will do the whole lot of the magic of this script by typing the following in our new file:

# Importing the libraries that will do the magic part 🐵
import speech_recognition as sr
import webbrowser as wb

Now, let’s create the function that will hold our whole routine.

def fn_speech_recognition():

now, let’s initialize the microphone in our speech recognition instance, by using the Microphone method and passing the device_index argument with an initial or default value of zero. This will allow us to get the first mic available in our computer.

    sr.Microphone(device_index = 0)

If you, out of curiosity, would like to know how many microphones you have installed in the computer, you could use the following command:

    print(f"MICs Found on this Computer: \n {sr.Microphone.list_microphone_names()}")

Now, let’s create the recognizer instance and setup some of the most important parameters in order to make it work smoothly:

    # Creating a recognition object instance
    r = sr.Recognizer()
    r.energy_threshold=4000
    r.dynamic_energy_threshold = False

Now the microphone will be our source to capture the command given by the user. We will use the adjust for ambient noise and the listen method to do this:

    with sr.Microphone() as source:
        print('Please Speak Loud and Clear:')
        #reduce noise
        r.adjust_for_ambient_noise(source)
        #take voice input from the microphone
        audio = r.listen(source)

As usual, I recommend to use the try… catch block to manage your errors. Let’s complete the code for this tutorial integrating this block in it as follows:

        try:
            phrase = r.recognize_google(audio)
            print(f"Did you just say: {phrase} ?")
            url = "https://www.google.com/search?q="
            search_url  = url+phrase
            wb.open(search_url)
        except TimeoutException as msg:
            print(msg)
        except WaitTimeoutError:
            print("listening timed out while waiting for phrase to start")
            quit()
        # speech is unintelligible
        except LookupError:
            print("Could not understand what you've requested.")
        else:
            print("Your results will appear in the default browser. Good bye for now...")

Last, we call the function and start using our script: are you nervous? I am!

fn_speech_recognition()

Final Source Code:

Now that we have break our script down, let’s put together a final source code for you guys to test it out and play with it. Please leave your comments with some other approaches or better solutions, to update the post if needed.

# Basic Speech Recognition Demonstration Routine for my medium blog 😊
# Made with ❤️ in Python 3 by Alvison Hunter - November 14th, 2021
# JavaScript, Python and Web Development tips at: https://bit.ly/3p9hpqj

# Importing the libraries that will do the magic part 🐵
import speech_recognition as sr
import webbrowser as wb
def fn_speech_recognition():
    sr.Microphone(device_index = 0)
    print(f"MICs Found on this Computer: \n {sr.Microphone.list_microphone_names()}")
    # Creating a recognition object
    r = sr.Recognizer()
    r.energy_threshold=4000
    r.dynamic_energy_threshold = False

    with sr.Microphone() as source:
        print('Please Speak Loud and Clear:')
        #reduce noise
        r.adjust_for_ambient_noise(source)
        #take voice input from the microphone
        audio = r.listen(source)
        try:
            phrase = r.recognize_google(audio)
            print(f"Did you just say: {phrase} ?")
            url = "https://www.google.com/search?q="
            search_url  = url+phrase
            wb.open(search_url)
        except TimeoutException as msg:
            print(msg)
        except WaitTimeoutError:
            print("listening timed out while waiting for phrase to start")
            quit()
        # speech is unintelligible
        except LookupError:
            print("Could not understand what you've requested.")
        else:
            print("Your results will appear in the default browser. Good bye for now...")


fn_speech_recognition()

Thanks for reading this article, I hope you enjoyed it as much as I did writing it. Until next time, dear readers!

❤️ Your enjoyment of this article encourages me to write further.
💬 Kindly share your valuable opinion by leaving a comment.
🔖 Bookmark this article for future reference.
🔗 If this article has truly helped you, please share it.

DEV Community

Speech Recognition Fundamentals using Python

Coding our Script!

Final Source Code:

Top comments (0)

Read next

What is SignalR? A Real-Time Communication Framework for .NET

Autofac Dependency Injection in ASP .NET Core 8

Transform Your Cloud Migration Strategy: Transition Microsoft workloads to Linux on AWS with AI Solutions

1760. Minimum Limit of Balls in a Bag