This is a submission for the AssemblyAI Challenge: Sophisticated Speech-to-Text & No More Monkey Business. π
What I Built π οΈ
I built AudioIntel - a powerful platform that transforms audio content into actionable intelligence using AssemblyAI's cutting-edge APIs. The platform helps users extract valuable insights from audio content through advanced transcription, analysis, and AI-powered features. β¨
π Live Demo: https://audiointel.amitwani.dev
π₯ Demo Video
Journey πΊοΈ
The Inspiration π‘
The idea for AudioIntel came from my own struggles with processing audio content efficiently. As someone who consumes a lot of podcasts, interviews, and video content, I often found myself wanting to quickly extract key insights without listening to hours of content. I realized this was a common pain point for many content creators, researchers, and professionals. π§
Learning & Iterations π
- π Integration with AssemblyAI's powerful APIs for transcription and analysis
- π£οΈ Leveraging AssemblyAI's speaker diarization and sentiment analysis features
- π§ Leveraging AssemblyAI with LeMUR for summarization, question answering, and intelligent content analysis
- β οΈ Error handling in audio processing and real-time status updates
- π State management for handling complex UI interactions
- β‘ Performance optimization for processing large audio files
- πΎ Database integration using Neon PostgreSQL with Drizzle ORM
- π User authentication implementation with Better Auth
- π Language translation features using Google Translate API
- π€ File upload handling through UploadThing integration
Features Showcase β¨
Multiple Input Sources π₯
- π File Upload: Support for various audio formats through UploadThing integration
- ποΈ Browser Recording: Direct audio capture using the Web Audio API
- πΊ YouTube Integration: YouTube video to audio conversion and analysis
Real-time Analysis π
- π₯ Speaker diarization with timeline visualization
- π Sentiment analysis with color-coded segments
- π Interactive transcript search and navigation
- π¬ Interactive chat with the transcript
Smart Content Generation π
- π€ AI-powered blog post creation
- π Context-aware chat interface
- π Key sections identification with timestamps
Language Translation π
- π Translate transcript to multiple languages
Screenshots πΈ
Multiple Sources - Audio file, Record file & YouTube π±
Overview & Analysis π
Interactive Features β‘
Tech Stack π»
- π₯ Framework: Next.js 14 with App Router
- π Language: TypeScript
- πΎ Database: Neon PostgreSQL with Drizzle ORM
- π¨ UI: Tailwind CSS + shadcn/ui
- ποΈ Audio Processing: AssemblyAI
- π€ File Upload: UploadThing
- π Analytics: OpenPanel
- π Authentication: Better Auth
- π Translation: Google Translate
- π Deployment: Vercel
Techincal Archicture ποΈ
Technical Implementation βοΈ
AssemblyAI Integration π
I leveraged several powerful features from AssemblyAI's SDK:
- Transcription API
const transcript = await assemblyai.transcripts.transcribe({
audio: fileUrl,
speaker_labels: true,
summarization: true,
summary_model: "conversational",
summary_type: "bullets",
sentiment_analysis: true,
});
- LeMUR for Content Generation
// Generate blog post
const { response: blogPostResponse } = await assemblyai.lemur.task({
transcript_ids: [transcript.id],
prompt: `Generate a blog post from the transcript in markdown format`,
final_model: "anthropic/claude-3-5-sonnet",
});
// Generate actionable insights
const { response: insights } = await assemblyai.lemur.task({
transcript_ids: [transcript.id],
prompt: `Provide actionable insights from the transcript`,
final_model: "anthropic/claude-3-5-sonnet",
});
- LeMUR for Interactive Chat
const { response: qas } = await assemblyai.lemur.questionAnswer({
transcript_ids: [transcriptId],
final_model: "anthropic/claude-3-5-sonnet",
questions: [{ question: userMessage, answer_format: "short sentence" }],
});
Future Enhancements π
- Multi-language support
- Advanced analytics dashboard
- API endpoints
- Custom templates
- Advanced search capabilities
Source Code π
mtwn105 / audio-intel
AudioIntel - Audio/Video Intelligence, Transcripts, Summary, and much more
ποΈ AudioIntel
Transform audio into actionable intelligence with our powerful AI platform. AudioIntel helps you extract valuable insights from audio content through transcription, analysis, and AI-powered features.
β¨ Features
-
π΅ Multiple Input Methods
- Upload audio files (MP3, WAV)
- Record directly in browser
- Analyze YouTube videos
-
π€ AI-Powered Analysis
- Smart summaries and key takeaways
- Sentiment analysis
- Speaker identification
- Actionable insights generation
-
π Content Generation
- Automatic blog post creation
- Interactive chat with transcripts
- Key sections identification
-
π Advanced Features
- Timeline view with precise timestamps
- Multi-speaker detection
- Searchable transcripts
- Real-time sentiment tracking
π Getting Started
Prerequisites
- Node.js 18+
- npm or yarn
- AssemblyAI API key
Installation
- Clone the repository
git clone https://github.com/yourusername/audio-intel.git
cd audio-intel
- Install dependencies
npm install
# or
yarn install
- Set up environment variables
cp .env.example .env
Required environment variables:
ASSEMBLYAI_API_KEY=your_api_key
NEXT_PUBLIC_APP_URL=http://localhost:3000
UPLOADTHING_TOKEN=your_uploadthing_token
GOOGLE_GENERATIVE_AI_API_KEY=your_google_generative_ai_api_key
GOOGLE_TRANSLATE_API_KEY=your_google_translate_api_key
BETTER_AUTH_SECRET=your_better_auth_secret
BETTER_AUTH_BASE_URL=http://localhost:3000
DATABASE_URL=your_database_url
- Run the development server
npm run dev
# or
yarn dev
Openβ¦
Submission π
This submission was made for the AssemblyAI Challenge for "Sophisticated Speech-to-Text" & "No More Monkey Business" Prompts.
Conclusion π
I had a great time participating in the AssemblyAI Challenge and learned a lot from the experience. I'm looking forward to seeing what other developers come up with! π
Thank you Dev.To & AssemblyAI for organizing this challenge and providing such a great platform for developers to showcase their skills! π
Top comments (0)