DEV Community

Ifeanyi Idiaye
Ifeanyi Idiaye

Posted on

Transcription & Translation App Powered by Assembly AI & Google Gemini

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I built a web application that captures live audio recording, via a web microphone; transcribes the recording, and then translates the transcript into any of 15 languages.

Demo

https://transcribe-and-translate.netlify.app/

AudioTranscriber

Journey

I used AssemblyAI's Universal-2 Speech-to-Text model's api to transcribe the audio recording. I got the API key from my AssemblyAI account dashboard. I built an audio transcriber function, which takes an audio file and passes that to AssemblyAI's transcriber function (aai.Transcriber()), which turns the speech into text.

Along with the audio transcription, I also implemented a translation feature using Google's Gemini 1.5 pro 002 model. This feature leverages the multi-modal capability of Google Gemini models to translate the audio transcript into any of 15 languages, including Spanish, Hindi, Yoruba, and Dutch.

You can find all the code on github: https://github.com/Ifeanyi55/Transcribe-and-Translate

Top comments (0)