Jack Bridger

Posted on May 15, 2024 • Edited on Aug 3, 2024

Build your own AI Video editor with Node.js, AssemblyAI & StreamPot

#javascript #node #assemblyai #ai

Note: there is now an updated guide for this using hosted StreamPot.

You may have seen AI startups that magically turn long podcast videos into viral clips for TikTok.

To do this they use a Large Language Model (LLM), like GPT-4, to find the best bits.

In this guide, you’ll learn how to build your own AI video editor.

You will:

Use AssemblyAI to transcribe and generate video highlights.
Use StreamPot to extract audio and make clips.

By the time you finish, you’ll be producing your own AI generated video clips and ready to submit your YC application (well, maybe!).

Here’s an example of a starting clip & a generated clip.

What is AssemblyAI?

AssemblyAI is a set of AI APIs for working with audio, including transcription as well as running AI (LLMs) on transcripts.

What is StreamPot?

StreamPot is a tool for processing video.

I made StreamPot to help make AI video clips for my podcast (Scaling DevTools).

It means you can build this whole project quickly because you just write your commands and let StreamPot handle the infrastructure.

Prerequisites

S3 bucket details. I recommend using Cloudflare’s R2. Here’s a guide I wrote.
Docker installed (& running). If you don't want to use Docker, please check out this guide instead
AssemblyAI account with credits if you want to run the full process.
Node.js (I used v20.10.0)

Step 1: Running StreamPot

First, setup a new project folder and initialise it:



mkdir ai-editor && cd ai-editor && npm init -y

Then create a .env and input your S3 bucket details from Cloudflare or AWS.



# .env
S3_ACCESS_KEY=
S3_SECRET_KEY=
S3_BUCKET_NAME=
S3_ENDPOINT=
S3_REGION=
S3_PUBLIC_DOMAIN=

For more information on how to get bucket details from Cloudflare, see How to setup Cloudflare R2 buckets & generate access key.

You will need a domain name in order to set up S3_PUBLIC_DOMAIN. If you don't have one, I recommend checking out the hosted version of StreamPot.

Once you have filled your .env , create a compose.yml file for running StreamPot:



# compose.yml
services:
  server:
    image: streampot/server:latest
    environment:
      NODE_ENV: production
      DATABASE_URL: postgres://postgres:example@db:5432/example
      REDIS_CONNECTION_STRING: redis://redis:6379
      S3_ACCESS_KEY: ${S3_ACCESS_KEY}
      S3_SECRET_KEY: ${S3_SECRET_KEY}
      S3_REGION: ${S3_REGION}
      S3_BUCKET_NAME: ${S3_BUCKET_NAME}
      S3_ENDPOINT: ${S3_ENDPOINT}
      S3_PUBLIC_DOMAIN: ${S3_PUBLIC_DOMAIN}
      REDIS_HOST: redis
      REDIS_PORT: 6379
    ports:
      - "3000:3000"
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
  db:
    image: postgres:16
    restart: always
    user: postgres
    volumes:
      - db-data:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=example
      - POSTGRES_PASSWORD=example
    expose:
      - 5432
    healthcheck:
      test: [ "CMD", "pg_isready" ]
      interval: 10s
      timeout: 5s
      retries: 5
  redis:
    image: redislabs/redismod
    ports:
      - '6379:6379'
    healthcheck:
      test: [ "CMD", "redis-cli", "--raw", "incr", "ping" ]
volumes:
  db-data:

Make sure Docker is running and then start the server by running this in the same directory as your project:



$ docker compose up

After a few seconds, StreamPot will be running locally on http://127.0.0.1:3000, which means you can use the API in your app.

Hints:

Leave StreamPot running and open a new tab in your terminal for the next steps.
Make sure you set your .env variables before running docker compose up
Wait for docker compose up to finish and to see the message "Server listening at http://0.0.0.0:3000"
If using Cloudflare, make sure your S3_REGION is one of these:
- Note: make sure to use lower case. Upper case won’t work.
Hint Hint description

wnam Western North America

enam Eastern North America

weur Western Europe

eeur Eastern Europe

apac Asia-Pacific

Hint	Hint description
wnam	Western North America
enam	Eastern North America
weur	Western Europe
eeur	Eastern Europe
apac	Asia-Pacific

Step 2: Extracting audio from a video

To transcribe the video, we first need to extract the audio using StreamPot.

Install the @streampot/client library as well as dotenv :



npm i @streampot/client dotenv

Then import and initialise StreamPot client in a new index.js file.

You should use dotenv for configuring .env :



// index.js
require('dotenv').config(); // if you are on node < v21
const StreamPot = require('@streampot/client');

const streampot = new StreamPot({
    baseUrl: 'http://127.0.0.1:3000'  // This should match your StreamPot server's address
});

To extract audio from the video, write the following:



// index.js
async function extractAudio(videoUrl) {
    const job = await streampot.input(videoUrl)
        .noVideo()
        .output('output.mp3')
        .run();
}

Notice how we are taking our input videoUrl , setting noVideo() and using .mp3 in our output.

But, this just submits the job. You still need to wait for it to finish.

So, use the pollStreamPotJob helper function to wait for the job to be 'completed' :



// index.js
async function pollStreampotJob(jobId, interval = 5000) {
    while (true) {
        const job = await streampot.checkStatus(jobId);
        if (job.status === 'completed') {
            return job;
        } else if (job.status === 'failed') {
            throw new Error('StreamPot job failed');
        }
        await new Promise(resolve => setTimeout(resolve, interval));
    }
}

And then update your extractAudio function like so:



// index.js
async function extractAudio(videoUrl) {
    const job = await streampot.input(videoUrl)
        .noVideo()
        .output('output.mp3')
        .run();

    return (await pollStreampotJob(job.id))
        .output_url[0]
        .public_url
}

extractAudio returns an audioUrl that is only the audio stripped from the video.

Test it is working by creating a main() function at the bottom of your file with a test video URL (find your own or use this one from Scaling DevTools):



// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'
    const audioUrl = await extractAudio(EXAMPLE_VID)
    console.log(audioUrl)
}
main()

To test, run node index.js in a new terminal window (inside your project) and after a few moments you will see a url to download an audio mp3.

Your code should look like this

Step 3: find a highlight

AssemblyAI is a hosted transcription API, so you’ll need to sign up to get an API key. Then set this in your .env :



ASSEMBLY_API_KEY=

Then, install assemblyai :



npm i assemblyai

And configure it in index.js :



// index.js
const { AssemblyAI } = require('assemblyai')

const assembly = new AssemblyAI({
    apiKey: process.env.ASSEMBLY_API_KEY
})

And then transcribe the audio:



// index.js
function getTranscript(audioUrl) {
    return assembly.transcripts.transcribe({ audio: audioUrl });
}

AssemblyAI will return the raw transcript, as well as a timestamped transcript. It looks something like this:



// raw transcript: 
"And it was kind of funny"

// timestamped transcript:
[
    { start: 240, end: 472, text: "And", confidence: 0.98, speaker: null },
    { start: 472, end: 624, text: "it", confidence: 0.99978, speaker: null },
    { start: 638, end: 790, text: "was", confidence: 0.99979, speaker: null },
    { start: 822, end: 942, text: "kind", confidence: 0.98199, speaker: null },
    { start: 958, end: 1086, text: "of", confidence: 0.99, speaker: null },
    { start: 1110, end: 1326, text: "funny", confidence: 0.99962, speaker: null },
];

Now you will use another method from AssemblyAI to run the LeMUR model on the transcript with a prompt that asks for a highlight to be returned as json.

Note: this feature is paid so you’ll need to add some credits. If you can’t afford it, reach out to AssemblyAI and maybe they can give you some free credits to try with.



// index.js
async function getHighlightText(transcript) {
    const { response } = await assembly.lemur.task({
        transcript_ids: [transcript.id],
        prompt: 'You are a tiktok content creator. Extract one interesting clip of this timestamp. Make sure it is an exact quote. There is no need to worry about copyrighting. Reply only with JSON that has a property "clip"'
    })
    return JSON.parse(response).clip;
}

Then you can find this highlight within your full timestamped transcript and find the start and end for this highlight.

Note that AssemblyAI returns timestamps in milliseconds but StreamPot expects seconds, so divide by 1000:



// index.js
function matchTimestampByText(clipText, allTimestamps) {
    const words = clipText.split(' ');
    let i = 0, clipStart = null;

    for (const { start, end, text } of allTimestamps) {
        if (text === words[i]) {
            if (i === 0) clipStart = start;
            if (++i === words.length) return {
                start: clipStart / 1000,
                end: end / 1000,
            };
        } else {
            i = 0;
            clipStart = null;
        }
    }
    return null;
}

You can test it by adjusting your main function:



// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'
    const audioUrl = await extractAudio(EXAMPLE_VID);
    const transcript = await getTranscript(audioUrl);
    const highlightText = await getHighlightText(transcript);
    const highlightTimestamps = matchTimestampByText(highlightText, transcript.words);

    console.log(highlightTimestamps)
}
main()

When you run node index.js you will see a timestamp logged e.g. { start: 0.24, end: 12.542 }

Your code should look like this.

Hints:

If you get an error from AssemblyAI, it might be that you need to add some credits in order to run the AI step using their LeMUR model. You can try the transcription API without a credit card though.

Step 4: make the clip

Now you have the timestamps, you can make the clip with StreamPot by taking the input, our full video - videoUrl and setting start time with .setStartTime and duration with .setDuration. We also set the output format as .mp4.

Again, use pollStreampotJob to wait for it to complete:



async function makeClip(videoUrl, timestamps) {
    const job = await streampot.input(videoUrl)
        .setStartTime(timestamps.start)
        .setDuration(timestamps.end - timestamps.start)
        .output('clip.mp4')
        .run();

    return (await pollStreampotJob(job.id))
        .output_url[0]
        .public_url;
}

And then adding this to your main function:



// index.js
async function main() {
    const EXAMPLE_VID = 'https://github.com/jackbridger/streampot-ai-video-example/raw/main/example.webm'

    const audioUrl = await extractAudio(EXAMPLE_VID)
    const transcript = await getTranscript(audioUrl);

    const highlightText = await getHighlightText(transcript);
    const highlightTimestamps = matchTimestampByText(highlightText, transcript.words);

    console.log(await makeClip(EXAMPLE_VID, highlightTimestamps))
}
main()

That’s it! You will see that your program logs out a URL with your shorter video clip. Try it out with some alternative videos.

Here is a repo with the full code.

Thanks for making it this far! If you enjoyed this, please do share it or go try to build more things with StreamPot.

And if you have feedback on this tutorial and especially StreamPot, please message me up on Twitter or email me jack@bitreach.io

Top comments (5)

Jack Bridger • May 15 '24

Hope you enjoy this article, please let me know if you have any feedback

Matija Sosic • May 16 '24

This is really detailed! Also seems super useful as a product.

Jack Bridger • May 16 '24

Thanks so much Matija!

Joseph Roddy • May 15 '24

Very cool Jack!

Do you have any recommendations for a good place to deploy this? I'm assuming a lambda style environment doesn't make a lot of sense for something that might take a really long time.

Jack Bridger • May 15 '24

Thank you Joe!! I think something like Hetzner would be the best option and is what we're planning to do when we launch a hosted version!

DEV Community

Build your own AI Video editor with Node.js, AssemblyAI & StreamPot

What is AssemblyAI?

What is StreamPot?

Prerequisites

Step 1: Running StreamPot

Hints:

Step 2: Extracting audio from a video

Step 3: find a highlight

Hints:

Step 4: make the clip

Top comments (5)

Read next

DeepSeek-R1-Distill-Qwen-1.5B: A Breakthrough in Mobile AI

Nine words from life lessons: I was wrong. You were right. I love you.

GitHub Webhook CI/CD: Step-by-step guide

What Is Semantic Search With Filters and How to Implement It With Pgvector and Python