Mercy

Posted on Nov 22, 2024

Sophisticated Speech-to-Text Submission Template, The AssemblyAI challenge.

#devchallenge #assemblyaichallenge #ai #api

This is a submission for the AssemblyAI Challenge : Sophisticated Speech-to-Text.

What I Built

A Speech-to-Text Transcription Web Application using Flask for the backend and AssemblyAI's API for real-time audio transcription. The frontend, built with HTML, CSS, and jQuery, offers an interactive interface for users to control the transcription process and view transcribed text in real-time.

Demo

Here is the link to my app

Journey

Key Features

Real-Time Transcription:

Utilizes AssemblyAI's real-time API to process live audio input from the user's microphone and convert it to text.
Supports both partial and final transcripts.

Web Interface:

Clean and intuitive design with buttons to start and stop transcription.
Displays the transcribed text dynamically in a formatted
```
 block.
```

Flask Backend:

Handles routes for starting (/start), stopping (/stop), and retrieving the transcript (/transcript).
Runs transcription in a separate thread to ensure non-blocking operations.

Polling Mechanism:

Implements a JavaScript-based polling system using jQuery to fetch the latest transcribed text every second.

Customizable Word Boost:

Boosts recognition accuracy for specific words like "AWS," "Azure," and "Google Cloud."

Responsive Design:

Ensures usability across devices with a centralized, easy-to-use layout.

Technology Stack

Backend:

Python (Flask): Manages the web server and API interactions.
AssemblyAI API: Handles speech-to-text transcription.

import assemblyai as aai
from flask import Flask, render_template, jsonify
import os
from dotenv import load_dotenv
import threading

app = Flask(__name__)
load_dotenv()

aai.settings.api_key = os.getenv('API_KEY')

transcriber = None
transcribed_text = ""

def on_open():
    print("Transcription started!")

def on_data(transcript: aai.RealtimeTranscript):
    global transcribed_text
    if not transcript.text:
        return

    if isinstance(transcript, aai.RealtimeFinalTranscript):
        transcribed_text += transcript.text + "\n"
        print("Transcribed:", transcript.text)  # Verify text here
    else:
        print("Received partial:", transcript.text)


def on_error(error):
    print("Error:", error)

def on_close():
    print("Transcription stopped!")

def start_transcription():
    global transcriber
    microphone_stream = aai.extras.MicrophoneStream(sample_rate=16_000)
    transcriber = aai.RealtimeTranscriber(
        encoding=aai.AudioEncoding.pcm_mulaw,
        sample_rate=16_000,
        word_boost=["aws", "azure", "google cloud"],
        end_utterance_silence_threshold=500,
        on_open=on_open,
        on_data=on_data,
        on_error=on_error,
        on_close=on_close,
    )

    for audio_data in microphone_stream:
        if transcriber is not None:
            transcriber.stream(audio_data)
        else:
            break

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/start')
def start():
    global transcribed_text
    transcribed_text = ""  # Clear previous transcript
    threading.Thread(target=start_transcription).start()
    return jsonify({"message": "Transcription started!"})


@app.route('/stop')
def stop():
    global transcriber
    if transcriber is not None:
        transcriber.close()
        transcriber = None
        print("Transcriber closed")
    return jsonify({"message": "Transcription stopped!"})

@app.route('/transcript')
def transcript():
    global transcribed_text
    return jsonify({"transcript": transcribed_text})


if __name__ == "__main__":
    app.run(debug=True)

Frontend:

HTML & CSS: Provides structure and styling for the user interface.
jQuery: Handles AJAX requests for starting, stopping, and polling the transcription.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Speech to Text App</title>
    <script src="https://code.jquery.com/jquery-3.5.1.min.js"></script>
    <style>
        body {
            margin: 0;
            display: flex;
            justify-content: center;
            align-items: center;
            height: 100vh; /* Full viewport height */
            font-family: Arial, sans-serif;
            background-color: #f4f4f4; /* Light background for better readability */
        }

        #container {
            text-align: center;
            background: #ffffff;
            padding: 20px;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
            border-radius: 8px;
        }

        button {
            margin: 10px;
            padding: 10px 20px;
            font-size: 16px;
            border: none;
            border-radius: 5px;
            background-color: #007bff;
            color: white;
            cursor: pointer;
        }

        button:hover {
            background-color: #0056b3;
        }

        pre {
            padding: 10px;
            background-color: #e9ecef;
            border-radius: 5px;
            overflow: auto;
        }
    </style>
</head>
<body>
    <div id="container">
        <h1>Speech-to-Text Transcription</h1>
        <button id="start">Start Transcription</button>
        <button id="stop">Stop Transcription</button>
        <h2>Transcribed Text:</h2>
        <pre id="transcript"></pre>
    </div>

    <script>
        $(document).ready(function() {
            let pollInterval; // Variable to hold the interval ID

            // Start transcription
            $('#start').click(function() {
                $.get('/start', function(data) {
                    console.log(data.message);

                    // Start polling for transcripts if not already polling
                    if (!pollInterval) {
                        pollInterval = setInterval(function() {
                            $.ajax({
                                type: 'GET',
                                url: '/transcript',
                                dataType: 'json',
                                success: function(data) {
                                    console.log(data);
                                    if (data && data.transcript) {
                                        $('#transcript').text(data.transcript);
                                    } else {
                                        $('#transcript').text('No transcription available yet.');
                                    }
                                },
                                error: function(err) {
                                    console.error('Error fetching transcript:', err);
                                }
                            });
                        }, 1000);
                    }
                });
            });

            // Stop transcription
            $('#stop').click(function() {
                $.get('/stop', function(data) {
                    console.log(data.message);

                    // Stop polling for transcripts
                    if (pollInterval) {
                        clearInterval(pollInterval);
                        pollInterval = null; // Reset the interval variable
                    }
                });
            });
        });
    </script>

</body>
</html>

Audio Input:

AssemblyAI's MicrophoneStream: Streams audio data for real-time processing.

I utilized additional prompts to enhance the project. I employed the #FlaskWebFramework for rendering templates and returning JSON responses, and I used the #dotenv library to load environment variables from the env file. On the frontend, I implemented CSS for styling the user interface.

Lastly, I want to thank my team, @devnenyasha, and @lindiwe09, for their UI idea. If not for them my UI would have been a mess.

DEV Community

Sophisticated Speech-to-Text Submission Template, The AssemblyAI challenge.

What I Built

Demo

Journey

Key Features

Technology Stack

Top comments (0)

Read next

How to understand the ins and outs of how DNS really works.

Can AI Really Replace Dev Jobs? Let’s Talk About It

Build a language detection app with Chrome's Language Detection API in Angular

Transform Your Cloud Migration Strategy: Transition Microsoft workloads to Linux on AWS with AI Solutions