DEV Community

Cover image for DialogueAI: Interactive Playground for assemblyai speech to text API and leMUR API and generate code for the configurations
Omkar tripathi
Omkar tripathi

Posted on

DialogueAI: Interactive Playground for assemblyai speech to text API and leMUR API and generate code for the configurations

This is a submission for the AssemblyAI Challenge: : Sophisticated Speech-to-text and No More Monkey Business.

What I Built

I built DialogueAI, an interactive platform that leverages the powerful capabilities of AssemblyAI's sophisticated speech-to-text API and their LeMUR summarization model. The primary goal of this platform is to simplify the process for users who are new to these APIs, helping them overcome the steep learning curve typically associated with diving into new documentation.

Key Features of the Platform:

  1. Interactive Playground: Users can explore and experiment with various API functionalities through an intuitive interface. Input boxes, selection options, model selection, and summary types are all easily adjustable.

  2. Instant Results: With a single click, users can execute API calls and see the results immediately. This feature helps bridge the gap between learning and actual implementation.

  3. Code Generation: For those who prefer to handle API calls manually, the platform generates the necessary code snippets, which can be directly run on their systems. This feature significantly reduces the time and effort required to understand and use the API.

  4. Smart Summary Page: Similar to the main playground, this page offers various configuration options and examples to help users generate summaries of transcripts quickly. Users can also get the generated code to use by themselves.

By providing these features, the platform ensures that users can quickly and efficiently learn how to use AssemblyAI's APIs, reducing the frustration and time typically spent navigating complex documentation. This makes it an invaluable tool for developers and anyone looking to incorporate speech-to-text and summarization capabilities into their projects.

Journey

The inspiration for this platform came from my own experience when I first encountered AssemblyAI's API. I found it a bit confusing to get started with the documentation and the API usage. So, I set out to solve this problem not just for myself but for everyone else who might face the same challenge.

Tech Used

  • Frontend: React, TypeScript, Tailwind CSS
  • API: AssemblyAI Speech-to-Text, LeMUR LLM model summary API
  • Animations: Framer Motion

Working Features

  1. Interactive Speech-to-Text Configurations:
    • Users can easily configure and experiment with various settings.
    • Single Click Run: Execute the configuration and see results immediately.
    • Single Click Code Generation: Generates the code based on the configuration for users to use directly.

Configurations Available:

  • API Key
  • Speech Model
  • Word Boost
  • Profanity Filter
  • Audio Range
  • Audio Intelligence
  • Summary Model
  • Summary Type

Interactive Speech-to-Text Configurations

Interactive Speech-to-Text Configurations

Interactive Speech-to-Text Configurations

Interactive Speech-to-Text Configurations

Interactive Speech-to-Text Configurations

Generted code and summary

Generted code and summary

  1. Interactive Summary Generation with LeMUR:
    • Users can generate summaries with various options and configurations.
    • Single Click Run: Instantly generate summaries.
    • Single Click Code Generation: Provides the code for generating summaries.

Configurations Available:

  • API Key
  • Summary Type (Basic, Custom)
  • Transcript ID
  • Model
  • Prompt
  • Custom Prompt
  • Max Output Tokens (Example Pre-coded)

Image description

Image description

Image description

Image description

In Development

  • Chat with the Transcript: Using LeMUR API to enable interactions with the generated transcript.
  • Interactive Quiz Generation: Generate quizzes based on the transcript.

Image description

Journey

So far, I've successfully addressed the initial problem statements for the speech-to-text API and LeMUR summary model. This project has been incredibly exciting to work on, pushing the boundaries of what can be done with API interactions and user interface design.

Looking ahead, I plan to expand the platform to include interactive playgrounds and code generation capabilities for real-time APIs and more sophisticated use cases of LeMUR. This will further streamline the learning and implementation process for developers and enhance the overall user experience.

Top comments (0)