DEV Community

Man yin Mandy Wong for Tencent Cloud

Posted on • Edited on

Next-Gen Media SDK Solution Design (TRTC)

1. Immersive Convergence

1.1 Higher definition

Image description

According to statistics from Tencent Cloud, the average bitrate of internet streaming media played on PCs, tablets, mobile phones, and other terminals has been increasing since H1 2018. As people require higher definition, the compression rate has also improved as the bitrate increases. This is due to the developments from H.264 and H.265 to the recent H.266 with its over 100 technical proposals, which delivers 50% higher compression rates than H.265.

1.2 Stronger immersiveness

Image description

Advances have been made in the immersive experience of many applications, such as 3D guides, 3D modeling, AR/VR games, and multi-angle sports viewing.

1.3 Enhanced interaction

Image description

Real-time interaction is stronger. In particular, face point cloud data is collected from a mobile phone and then sent back to the audience member's device from the cloud.

1.4 Lower latency

Image description

Latency has achieved the greatest improvement. A few years ago, the latency on webpages was counted in seconds, but now it is measured in milliseconds, low enough for users to sing duets together in live rooms.

1.5 Four elements of the all-true internet

Image description

The all-true internet features a higher definition, enhanced interaction, stronger immersiveness, and lower latency. But this entails challenges and unavoidable difficulties in the cloud and on the terminal.

2. Technical Challenges

Let's take a look at the challenges and how to overcome them.

2.1 Challenge 1: RT-Cube™ architecture design

Image description

It's hard to coordinate internal modules no matter what you are working on, from an operating system to something smaller like an SDK. An SDK has many modules. The image shows a simplified version of the SDK module architecture, but you can still imagine the large number of modules that are actually involved. The bottom-left corner shows audio/video engine modules, the bottom-right corner TIM modules, and the top TUI components. When multiple modules are working together, they tend to scramble for CPU resources and encounter other conflicts.

Image description

The image above depicts the architecture design of the audio/video engine in RT-Cube™, which consists of many core modules with their respective submodules. Between those modules, there are much data communication and control logic. When the system runs stably, everything works well in unison. However, if the CPU frequency is reduced or the memory becomes insufficient, competition between modules will soon cause the entire system to crash. Therefore, a central control module is adapted to monitor and coordinate the modules in real-time and take intervention measures when necessary to better coordinate them and prevent an avalanche.

2.2 Challenge 2: RT-Cube™ version management

The second challenge relates to versioning. Although we offer many features, not all of them are needed by each customer. When they are packaged into different combinations, we need to manage a larger number of versions.

Image description

If an SDK offers nine features, there are 510 possible combinations, which translates into 510 * 4 = 2,040 versions in total on four platforms.

The traditional compilers such as Xcode and Android Studio are no longer applicable. A new platform with a compilation solution is needed to output SDKs for different platforms and allow for free combinations of features on different versions.

2.3 Challenge 3: RT-Cube™ quality monitoring

Image description

The third challenge is quality monitoring. Imagine that six users are watching a live stream or on a video conference. In a period of 20 minutes, one of them experiences 10 seconds of lag, while the others experience no lag. According to the monitoring data, the lag rate is 0.13%, which cannot reflect the poor experience of 10-second lag. If the rate is counted based on the percentage of users experiencing a lag, the value will be 16.7%. Thus, poor performance data should be the focus of monitoring and product performance. To avoid being obscured by reported data, it is important to keep the infrastructure unchanged and have a data packet that includes lag, size, blur, and acoustic echo reported every day. The algorithm should be refined and based on user metrics to reflect the poor experience. The result will then be used to figure out the number of affected users, percentage increase or decrease, and cause. That's how we find a way to improve.

2.4 Challenge 4: Module communication efficiency

Image description

The fourth challenge is the efficiency of communication between modules.

This problem is common with games. Many enterprises unify their backend systems using SDP standards and microservice languages, but they cannot normalize iOS, Android, or Windows platforms simply through C++. Texture image formats, Android formats, and Windows D3D are processed differently on iOS. If C++ is applied, all of them are processed through binary buffers. A great deal of unification work has been done to ensure data performance across different platforms.

3. Optimization and Improvement

Having discussed challenges and solutions, we move on to the optimizations and improvements that have been made in half a year to one year after the completion of the infrastructure upgrade.

3.1 Improvement 1: Audio module optimization

3.1.1 Feature

Image description

With the upgraded architecture, audio/video modules on the new version support many new capabilities, such as full-band audio, 3D audio effect, noise reduction based on deep learning and AI, and source and channel resistance. These capabilities enable many more challenging real-time interaction scenarios, for example, live duets which are highly sensitive to audio/video communication latency. In live music scenarios, music modes are optimized to restore signals as much as possible and achieve the highest possible resolution. In addition, a number of big data analysis means are leveraged to perform targeted monitoring and real-time analysis of sound problems, constantly reducing the failure rate and complaint rate by improving the audio quality.

3.1.2 Use

Image description

Audio modes are more diversified to make the product user-friendly. The speech mode is for conference communication, the default mode applies to most scenarios and can be enabled if you are not sure which mode is better, and the music mode is available for music listening. All the parameters can be customized.

3.2 Improvement 2: Video module optimization - effect

Image description

The video module is improved on the whole. Specifically, algorithms are improved for BT.601 and BT.709 color spaces, and BT.2020 and other HDR color spaces are supported. This makes images brighter. Targeted optimizations are also made to enhance the SDK definition without compromising the bitrate.

3.3 Improvement 3: Network module optimization

3.3.1 Architecture

Image description

Last but not least is the network module with our core technology used to implement stream control and overall reconstruction. As shown above, the cloud and terminal are integrated into a system with coordinated modules. Several data-driven optimizations are performed on the central control module.

3.3.2 Stream push

Image description

This is a more detailed part of the network module for two scenarios: live streaming and communication. For live streaming, the upstream algorithm is mainly used for ensuring definition and smoothness. For RTC communication, such as Tencent Meeting or VooV Meeting, the focus is on real-timeness and smoothness to eliminate high latency and lag.

3.3.3 Playback

Image description

Tencent Cloud delivers industry-leading playback performance in live streaming scenarios. It has a competitive CDN and has been constantly expanding into new scenarios, such as LEB. Besides standard browsers, LEB can use the SDK to deliver performance and effects in more formats at a latency of about one second, much better than browsers in demanding scenarios. In chat scenarios that require lower latency and stronger interaction, efforts can be made to smoothen mic-on/off.

3.4 Improvement 4: TUI component library

Image description

The TUI component library is also upgraded and completed. Instead of keeping hundreds of APIs of professional PaaS components and putting up with an unsatisfactory final product, you can import the TUI library for each platform in a few minutes and with a few lines of code. You can build a proper UI similar to those shown above within hours, even if you have never tried it before.

4. Summary

Image description

We've talked about the systematic design of component integration, where one plus one equals more than two.

In the cloud, we've successfully integrated three networks, that is, TRTC network, IM network, and CDN network.

On the terminal, existing features are continuously optimized in terms of stability and performance. For example, the squeeze theorem is applied in more scenarios and big data analysis cases to make the RTC SDK a leader in the industry in every respect. In addition, the LEB SDK and IM SDK with a new kernel will be integrated into the system to contribute to a powerful RT-Cube™ Media SDK architecture.

Thanks to the TUI component library with ready-to-use UI output, a strong and easy-to-use PaaS system is in place to offer more basic capability components for the all-true internet.

Image description

The RT-Cube Media SDK can be downloaded from the website as shown above. Currently, common versions are available, and custom capabilities will be online as the compilation system becomes more robust. You can freely combine different features to get the desired version.

Top comments (0)