Transcription & Translation App Powered by Assembly AI & Google Gemini

#devchallenge #assemblyaichallenge #ai #api

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

I built a web application that captures live audio recording, via a web microphone; transcribes the recording, and then translates the transcript into any of 15 languages.

Demo

https://transcribe-and-translate.netlify.app/

Journey

I used AssemblyAI's Universal-2 Speech-to-Text model's api to transcribe the audio recording. I got the API key from my AssemblyAI account dashboard. I built an audio transcriber function, which takes an audio file and passes that to AssemblyAI's transcriber function (aai.Transcriber()), which turns the speech into text.

Along with the audio transcription, I also implemented a translation feature using Google's Gemini 1.5 pro 002 model. This feature leverages the multi-modal capability of Google Gemini models to translate the audio transcript into any of 15 languages, including Spanish, Hindi, Yoruba, and Dutch.

You can find all the code on github: https://github.com/Ifeanyi55/Transcribe-and-Translate

DEV Community

Transcription & Translation App Powered by Assembly AI & Google Gemini

What I Built

Demo

Journey

Top comments (0)

Read next

2030 Apocalypse: AI’s Boom vs. Energy Crisis

Azure OpenAI in a single page: Zero to Hero – A Complete Integration Guide

Introduction to k8sgpt - Simplifying Kubernetes Troubleshooting - Part 1

MCP Server for MySQL