This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.
What I Built
I built a web application that captures live audio recording, via a web microphone; transcribes the recording, and then translates the transcript into any of 15 languages.
Demo
https://transcribe-and-translate.netlify.app/
Journey
I used AssemblyAI's Universal-2 Speech-to-Text model's api to transcribe the audio recording. I got the API key from my AssemblyAI account dashboard. I built an audio transcriber function, which takes an audio file and passes that to AssemblyAI's transcriber function (aai.Transcriber()
), which turns the speech into text.
Along with the audio transcription, I also implemented a translation feature using Google's Gemini 1.5 pro 002 model. This feature leverages the multi-modal capability of Google Gemini models to translate the audio transcript into any of 15 languages, including Spanish, Hindi, Yoruba, and Dutch.
You can find all the code on github: https://github.com/Ifeanyi55/Transcribe-and-Translate
Top comments (0)