Vivi-clevercoder

Posted on Jul 27, 2021

How a Programmer Developed a Live-Streaming App with Gesture-Controlled Virtual Backgrounds

#machinelearning #virtual

"What's it like to date a programmer?"

John is a Huawei programmer. His girlfriend Jenny, a teacher, has an interesting answer to that question: "Thanks to my programmer boyfriend, my course ranked among the most popular online courses at my school".

Let's go over how this came to be. Due to COVID-19, the school where Jenny taught went entirely online. Jenny, who was new to live streaming, wanted her students to experience the full immersion of traveling to Tokyo, New York, Paris, the Forbidden City, Catherine Palace, and the Louvre Museum, so that they could absorb all of the relevant geographic and historical knowledge related to those places. But how to do so?

Jenny was stuck on this issue, but John quickly came to her rescue.

After analyzing her requirements in detail, John developed a tailored online course app that brings its users an uncannily immersive experience. It enables users to change the background while live streaming. The video imagery within the app looks true-to-life, as each pixel is labeled, and the entire body image — down to a single strand of hair — is completely cut out.

How to Implement

Changing live-streaming backgrounds by gesture can be realized by using image segmentation and hand gesture recognition in HUAWEI ML Kit

The image segmentation service segments specific elements from static images or dynamic video streams, with 11 types of image elements supported: human bodies, sky scenes, plants, foods, cats and dogs, flowers, water, sand, buildings, mountains, and others.
The hand gesture recognition service offers two capabilities: hand keypoint detection and hand gesture recognition. Hand keypoint detection is capable of detecting 21 hand keypoints (including fingertips, knuckles, and wrists) and returning positions of the keypoints. The hand gesture recognition capability detects and returns the positions of all rectangular areas of the hand from images and videos, as well as the type and confidence of a gesture. This capability can recognize 14 different gestures, including the thumbs-up/down, OK sign, fist, finger heart, and number gestures from 1 to 9. Both capabilities support detection from static images and real-time video streams.
Development Process

1. Add the AppGallery Connect plugin and the Maven repository.

    buildscript {
    repositories {
        google()
        jcenter()
        maven {url 'https://developer.huawei.com/repo/'}
    }
    dependencies {
        ...
        classpath 'com.huawei.agconnect:agcp:1.4.1.300'
    }
}

allprojects {
    repositories {
        google()
        jcenter()
        maven {url 'https://developer.huawei.com/repo/'}
    }
}

2. Integrate required services in the full SDK mode.

    dependencies{
     // Import the basic SDK of image segmentation.
    implementation 'com.huawei.hms:ml-computer-vision-segmentation:2.0.4.300'
    // Import the multiclass segmentation model package.
    implementation 'com.huawei.hms:ml-computer-vision-image-segmentation-multiclass-model:2.0.4.300'
    // Import the human body segmentation model package.
    implementation 'com.huawei.hms:ml-computer-vision-image-segmentation-body-model:2.0.4.300'
    // Import the basic SDK of hand gesture recognition.
    implementation 'com.huawei.hms:ml-computer-vision-handkeypoint:2.0.4.300'
    // Import the model package of hand keypoint detection.
    implementation 'com.huawei.hms:ml-computer-vision-handkeypoint-model:2.0.4.300'
}

3. Add configurations in the file header.

Add apply plugin: 'com.huawei.agconnect' after apply plugin: 'com.android.application'.

4. Automatically update the machine learning model.
Add the following statements to the AndroidManifest.xml file:

<manifest
    ...
    <meta-data
        android:name="com.huawei.hms.ml.DEPENDENCY"
        android:value="imgseg,handkeypoint" />
    ...
</manifest>

5. Create an image segmentation analyzer.

MLImageSegmentationAnalyzer imageSegmentationAnalyzer = MLAnalyzerFactory.getInstance().getImageSegmentationAnalyzer();// Image segmentation analyzer.
MLHandKeypointAnalyzer handKeypointAnalyzer = MLHandKeypointAnalyzerFactory.getInstance().getHandKeypointAnalyzer();// Hand gesture recognition analyzer.

MLCompositeAnalyzer analyzer = new MLCompositeAnalyzer.Creator()
                                    .add(imageSegmentationAnalyzer)
                                   .add(handKeypointAnalyzer)
                                   .create();

6. Create a class for processing the recognition result.

    public class ImageSegmentAnalyzerTransactor implements MLAnalyzer.MLTransactor<MLImageSegmentation> {
    @Override
    public void transactResult(MLAnalyzer.Result<MLImageSegmentation> results) {
        SparseArray<MLImageSegmentation> items = results.getAnalyseList();
        // Process the recognition result as required. Note that only the detection results are processed.
        // Other detection-related APIs provided by ML Kit cannot be called.
    }
    @Override
    public void destroy() {
        // Callback method used to release resources when the detection ends.
    }
}

public class HandKeypointTransactor implements MLAnalyzer.MLTransactor<List<MLHandKeypoints>> {
    @Override
    public void transactResult(MLAnalyzer.Result<List<MLHandKeypoints>> results) {
        SparseArray<List<MLHandKeypoints>> analyseList = results.getAnalyseList();
        // Process the recognition result as required. Note that only the detection results are processed.
        // Other detection-related APIs provided by ML Kit cannot be called.
    }
    @Override
    public void destroy() {
        // Callback method used to release resources when the detection ends.
    }
}

7. Set the detection result processor to bind the analyzer to the result processor.

    imageSegmentationAnalyzer.setTransactor(new ImageSegmentAnalyzerTransactor());
handKeypointAnalyzer.setTransactor(new HandKeypointTransactor());

8. Create a LensEngine object.

    Context context = this.getApplicationContext();
LensEngine lensEngine = new LensEngine.Creator(context,analyzer)
    // Set the front or rear camera mode. LensEngine.BACK_LENS indicates the rear camera, and LensEngine.FRONT_LENS indicates the front camera.
    .setLensType(LensEngine.FRONT_LENS)
    .applyDisplayDimension(1280, 720)
    .applyFps(20.0f)
    .enableAutomaticFocus(true)
    .create();

9. Start the camera, read video streams, and start recognition.

    // Implement other logics of the SurfaceView control by yourself.
SurfaceView mSurfaceView = new SurfaceView(this);
try {
    lensEngine.run(mSurfaceView.getHolder());
} catch (IOException e) {
    // Exception handling logic.
}

10. Stop the analyzer and release the recognition resources when recognition ends.

    if (analyzer != null) {
    try {
        analyzer.stop();
    } catch (IOException e) {
        // Exception handling.
    }
}
if (lensEngine != null) {
    lensEngine.release();
}

For more information, please visit:

HUAWEI Developers official website

Development Guide

Redditto join developer discussions

GitHub or Gitee to download the demo and sample code

Stack Overflow to solve integration problems

DEV Community

How a Programmer Developed a Live-Streaming App with Gesture-Controlled Virtual Backgrounds

Top comments (0)

Read next

AI Vs Sam Altman - The War of Kings

Exploring LLMs' Potential for Generating High-Quality Patent Claims and Implications

“Advantage vs Disadvantage” of Utilizing -Devin Artificial Intelligence

A beginner's guide to the Qwen-7b-Chat model by Niron1 on Replicate