Ethan

Posted on Sep 16, 2024 • Edited on Oct 9, 2024

Introduction to the WebCodec API - Real Time Video Encoding and Display

#javascript #webdev #tutorial #html

Introduction

Hello! 😎

In this tutorial I will be introducing the WebCodec API and how to use it to display a video track in a canvas element.

The WebCodec API is an exciting advancement for developers looking to access low-level control over encoding and decoding of media streams in the browser. This API allows for efficient, high-performance encoding and decoding of media files such as video and audio without the need for complex external libraries. With WebCodec, developers can handle tasks like real-time video encoding, transcoding, and more, enabling user cases such as video conferencing, live streaming and video editing directly in the browser.

In this post, we'll break down a practical example of how to use WebCodec API to capture video from a user's camera, encode it in real time using VP8 encoding, and display both the raw video feed and the encoded frames on the screen. This will help you understand not only the fundamentals of the WebCodec API but also how to apply it to real- world scenarios. 😃

Why Use the WebCodec API?

Before diving into the code, let's discuss why the WebCodec API is such a valuable tool for developers:

Performance: WebCodec provides hardware-accelerated encoding and decoding, making it faster and more efficient than traditional JavaScript methods. This is especially important for real-time applications like video conferencing and live streaming.
Low-level Control: Unlike high-level APIs, WebCodec allows developers to handle raw media frames directly, providing fine-grained control over encoding parameters such as bit rate, resolution and codec.
Real-Time Processing: WebCodec is optimized for real-time video and audio processing. By allowing direct access to the codec, developers can reduce latency, a critical requirement for applications like gaming, video streaming and conferencing.

Now that we understand the importance of WebCodec, let's dive into the example code and break it down in detail. 👀

The Complete Example

Here is the full HTML file that demonstrates how to use WebCodec to capture video, encode it and display both the original feed and the encoded frames.

I'll be explaining the code after this.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <title>WebCodec Example</title>
    <style>
      body {
        display: flex;
        justify-content: center;
        align-items: center;
        height: 100vh;
        margin: 0;
      }

      video, canvas {
        width: 640px;
        height: 480px;
        border: 2px solid black;
        margin: 0 10px;
      }
    </style>
  </head>
  <body>
    <video id="video" autoplay playsinline></video>
    <canvas id="canvas" width="640" height="480"></canvas>

    <script>
      const videoElement = document.getElementById('video');
      const canvas = document.getElementById('canvas');
      const ctx = canvas.getContext('2d');
      let videoEncoder;

      const initEncoder = () => {
        videoEncoder = new VideoEncoder({
          output: (frame) => {
            // console.log('new frame', frame);
          },
          error: (error) => {
            console.error('encode error', error);
          }
        });

        videoEncoder.configure({
          codec: 'vp8',
          width: 640,
          height: 480,
          bitrate: 1_000_000,
          framerate: 30
        });
      };

      const handleStream = (stream) => {
        videoElement.srcObject = stream;

        const videoTrack = stream.getVideoTracks()[0];
        const processor = new MediaStreamTrackProcessor(videoTrack);
        const reader = processor.readable.getReader();

        const processFrames = async () => {
          while (true) {
            const { value: videoFrame, done } = await reader.read();

            if (done) {
              console.log('stream ended');
              break;
            }

            ctx.drawImage(videoFrame, 0, 0, canvas.width, canvas.height);

            const insertKeyFrame = true;
            videoEncoder.encode(videoFrame, { keyframe: insertKeyFrame });

            videoFrame.close();
          }
        };

        processFrames();
      };

      initEncoder();

      navigator.mediaDevices.getUserMedia({ video: true })
        .then((stream) => handleStream(stream))
        .catch((error) => console.error('failed to get camera', error));
    </script>
  </body>
</html>

Detailed Breakdown of the Code

HTML Structure

The HTML is quite simple.We have two main elements: a video element and a canvas element.

The video element is used to display the raw video stream directly from the user's camera.
The canvas element is used to display the frames that have been processed by WebCodec

Both elements are given the same dimensions of 640x480 pixels, they are aligned side by side using Flexbox, for easy comparison.

CSS for Styling

The CSS part defines a simple layout:

The body element uses Flexbox to center the video and canvas elements both horizontally and vertically on the screen.
Both the video and canvas elements are styled to be 640 pixels wide and 480 pixels high with a solid black border.

This styling ensures that both elements are displayed clearly and have the same size.

Javascript Initialization and Camera Access

The following code, enables us to get access to the camera:

navigator.mediaDevices.getUserMedia({ video: true })
  .then((stream) => handleStream(stream))
  .catch((error) => console.error('failed to get camera', error));

The method requests access to the user's camera. If the request for the camera access fails an error message is logged to the console.

Next we set the video source with the following:

videoElement.srcObject = stream;

Inside the handleStream function, the srcObject is set to the camera stream, which allows the browser to display the live video feed directly in the video element.

const videoTrack = stream.getVideoTracks()[0];
const processor = new MediaStreamTrackProcessor(videoTrack);
const reader = processor.readable.getReader();

With the above, we access the video track from the stream. The MediaStreamTrackProcessor API is then used to process individual frames from the video track. It converts the video track into a stream of individual frames that be read asynchronously using a ReadableStream reader.

The readable.getReader() method creates a ReadableStreamDefaultReader that allows us to read individual video frames.

Next we initialize the VideoEncoder:

const initEncoder = () => {
  videoEncoder = new VideoEncoder({
    output: (frame) => {
      // Process encoded frame (e.g., save or transmit)
    },
    error: (error) => {
      console.error('encode error', error);
    }
  });

  videoEncoder.configure({
    codec: 'vp8',
    width: 640,
    height: 480,
    bitrate: 1_000_000,
    framerate: 30
  });
};

The initEncoder function sets up the VideoEncoder. The VideoEncoder is initialized with two functions:

output: This function is called every time a frame is successfully encoded. You could use this function to handle the encoded frames (e.g saving them to a file or streaming them to a server).
error: This function logs any encoding errors.

In the configure method, we specify that the encoder will use the VP8 codec, with a resolution of 640x480 pixels, a bit rate of 1Mbps and a frame rate of 30 frames per second.

Next we will read and encode the frames:

const processFrames = async () => {
  while (true) {
    const { value: videoFrame, done } = await reader.read();

    if (done) {
      console.log('stream ended');
      break;
    }

    ctx.drawImage(videoFrame, 0, 0, canvas.width, canvas.height);

    const insertKeyFrame = true;
    videoEncoder.encode(videoFrame, { keyframe: insertKeyFrame });

    videoFrame.close();
  }
};

The processFrames function reads each frame from the video stream using the reader.read() method. This method returns a promise that resolves with the next video frame.

For each frame:

We use the ctx.drawImage method to draw the frame onto the canvas element.
We then pass the frame to the videoEncoder.encode() method to encode it. Here, we specify that a keyframe should be inserted. Keyframes are crucial in video encoding because they serve as reference points for decoding the rest of the frames.
Finally, we call videoFrame.close() to release the memory used by the frame.

If the video stream ends, the done property of the reader.read() call will be set to true, at which point we break out of the while loop and stop processing frames. 😁

Conclusion

Here I have shown the power and flexibility of the WebCodec API for handling real-time media streams in the browser. By using WebCodec, you can gain low-level control over video encoding, which is essential for applications requiring high performance and low latency, such as video conferencing or live streaming.

Key points to remember are:

MediaStreamTrackProcessor is used to break down a video stream into individual frames that can be processed.
VideoEncoder from the WebCodec API encodes raw video frames into a compressed format like VP8.
You can configure the encoder's parameters, such as resolution, bit rate, and frame rate, to suit your specific use case.

This combination of APIs opens up new possibilities for high-performance, real time video processing directly in the browser, paving a way for advanced web-based video applications.

I hope this tutorial helped you, I had a lot of fun trying this API out and I may expand on this and turn it into a series.

As always you can find the code on my Github:
https://github.com/ethand91/webcodec-example

Happy Coding! 😎

Like my work? I post about a variety of topics, if you would like to see more please like and follow me.
Also I love coffee.