Shannon Lal

Posted on Feb 28

Creating Thumbnails for a Video in Python with FFMPEG

#moviepy #ffmeg #videocompressoin #python

The following blog will be a continuation of my previous blog which provided an initial introduction on creating thumbnails for videos. In this blog, I will go through the steps and include code for creating a thumbnail using a python library called moviepy.

The steps I am going to go through for creating a thumbnail are as follows:

Download a video from Google Cloud Storage
Load the video file to get the duration
Determine the appropriate times to get the snapshot
Retrieve the frames from the video
Generate a thumbnail based on the frames
Upload the thumbnails back onto Google Cloud Storage

Download the Video from Google Storage

The following code will download a movie from google storage. It requires the Cloud Bucket and the path within the bucket to get the movie.

from google.cloud import storage

def get_movie(bucket_name: str, path: str):
        client = storage.Client()
        bucket = client.get_bucket(bucket_name)
        blob = bucket.blob(path)
        file = blob.download_as_bytes()
        return file

Load Video to get duration

To get the duration, I used moviepy (which behind scenes uses FFMPEG) to load the video and get the meta data from the file (including the video duration). To leverage FFMPEG we need to write the movie to the disk and then extract the meta.

def get_movie_duration(movie_file):
    try:
       with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as tmpfile:
         tmpfile_path = tmpfile.name  # Store the temporary file name
         tmpfile.write(file_data_bytes)
         tmpfile.flush()  # Ensure data is written to disk

       # Process the video file to get its duration
       with VideoFileClip(tmpfile_path) as video:
           duration = video.duration

    finally:
    # Cleanup: Ensure the temporary file is deleted if it exists
      if os.path.exists(tmpfile_path):
         os.remove(tmpfile_path)

Get the Times for Snapshot

This function will just return 4 evenly spaced time intervals based on the duration

    def _get_snapshot_times(duration, num_snapshots=4):
        interval = duration / (num_snapshots + 1)
        return [i * interval for i in range(1, num_snapshots + 1)]

Retrieve the frames at snapshot times

The following function will load the video and retrieve all the fames in one shot.

from PIL import Image
def retrieve_snapshots( video_path, times):  
    video = VideoFileClip(video_path)
    frames = [video.get_frame(time) for time in times]
    video.close()
    images = [Image.fromarray(frame) for frame in frames]

    return images

Generate thumbnails based on the frames

The following 2 functions will be responsible for resizing the image. The first function will use PIL image library to take the frame and resize it to a maximum of 500 pixels. The second function will create a thread pool and parallize the resizing of images for all the snapshots at the same time

FILE_FORMAT = 'png'

def resize_image(image, max_size=500):
        # Calculate the new size, preserving the aspect ratio
        ratio = float(max_size) / max(image.size)
        if ratio >= 1:  
            return image
        new_size = tuple([int(x*ratio) for x in image.size])
        image.thumbnail(new_size)
        buffer = BytesIO()
        image.save(buffer, format=FILE_FORMAT, optimize=True, quality=100)

def generate_thumbnail( images ):
   with ThreadPoolExecutor(max_workers=len(images)) as executor:
    # Create a future for each snapshot extraction
    futures = [executor.submit(self.resize_image, image, max_size=500) for image in images]

    for future in as_completed(futures):
       try:
                    image = future.result()
                    snapshot_images.append(image)
                except Exception as exc:
                    print(f"Generated an exception: {exc}")  
        return snapshot_images

Upload thumbnails to cloud

The following function will write the images to Google Cloud storage. This will be done in parallel to speed up the process


def upload_thumbnails_to_cloud(images, bucket_name:str, bucket_path:str, storage_client): 
        responses = []
        with ThreadPoolExecutor() as executor:
            futures = [executor.submit(self._write_thumbnail_to_cloud, image, bucket_name, bucket_path, storage_client, index)
                    for index, image in enumerate(images)]

            for future in as_completed(futures):
                try:
                    response = future.result()
                    responses.append(response)
                except Exception as exc:
                    print(f"Generated an exception: {exc}")  
        return responses

The following is how the code will all come together

video_file = get_movie(bucket_name, bucket_path)
duration = get_movie_duration(video_file)

snapshots = retrieve_snapshots(video_file, duration)

generate_thumbnail(snapshots)

upload_thumbnails(snapshots, bucket_name, bucket_path)

There are a couple of challenges with this approach in that when you are analyzing the video it writes the file locally and calls FFMPEG. If you are trying to do this in parallel or have extensive load this could be quiet challenging. To role this into production I would make sure you have a good auto scaling capability or look at something like Serverless functions.

I would definitely welcome any comments on the code above or any ways I could make it run faster.

Thanks

Shannon

Top comments (1)

Keyur Paralkar • Mar 3

Great blog post. Much needed automation is required to generate thumbnails of each frame of the video.

I made you a similar script that displays the frame at each second of the video when the seek bar is hovered just like YouTube. You can read more about it here:

dev.to/keyurparalkar/how-i-build-a...

DEV Community

Creating Thumbnails for a Video in Python with FFMPEG

Download the Video from Google Storage

Load Video to get duration

Get the Times for Snapshot

Retrieve the frames at snapshot times

Generate thumbnails based on the frames

Upload thumbnails to cloud

Top comments (1)

Read next

How to Detect and Defend Against SQL Injection Attacks(Part-1)[Must Read]

Using DSPy(COPRO) to refine prompt instructions

10 Python Scripts to Automate Your Daily Tasks

#? List vs Tuples in python