Transcribe speech input to text on Azure Cloud

Introduction

The Speech-to-Text aspect of the Speech service transcribes audio streams into text. Your application can display this text to the user or act upon it as command input. You can use this service either with an SDK client library (for supported platforms and languages) or a representational state transfer (REST) API.

Overview of speech-to-text

The Speech-to-text aspect of the Speech services, in Azure Cognitive Services, provides a real-time transcription of audio streams based on machine learning and artificial intelligence. The Speech services APIs allow developers to add end-to-end, real-time speech transcription to their applications or services.
Speech services are designed to perform real-time speech-to-text for scenarios like:

Translation of live presentations
In-person or remote translated communications
Customer support
Business intelligence
Media subtitling
Multilingual AI interactions

Create a Speech Service

Before you can begin performing your speech-to-text translation, you need to create an Azure Speech resource. You can do this by using the Azure portal, the Azure CLI, or the Cloud Shell. This exercise will use the Azure portal.

Sign in to the Azure portal.
Select + Create a resource. In the Search the Marketplace box, type speech and press Enter.
In the Results list, select Speech. In the Speech pane, select Create.
Enter a unique name for your Speech Service resource.
Select an appropriate subscription.
Choose a location to host the resource. This is typically the region where the resource will be used.
For the Pricing tier, select a tier. The tiers may change but currently, you can selects F0 or S0. For testing, we selected F0.
Create a new resource group (RG) named mslearn-speechapi to hold your resources. You can also choose an existing RG if you wish
Select Create to create a subscription to the Speech Translation API.

DEV Community

Transcribe speech input to text on Azure Cloud

Introduction

Overview of speech-to-text

Create a Speech Service

Top comments (0)

Read next

"Unlocking Long-Context Decoding: The Future of Language Models Revealed!"

Nextjs Bootstrap

Shopify with Nextjs

"Unraveling AI Misalignment: Insights and Innovations in Robotics"