DEV Community

Cover image for Control Google Meet With Expressions/Gestures
Ritabrata Das for GNU/Linux Users' Group, NIT Durgapur

Posted on • Edited on

Control Google Meet With Expressions/Gestures

With the advancements made in the field of modern day machine-learning technologies, it has become quite easy and flexible to develop models and applications based on this domain. So, you might have a perception that the realm of Machine Learning requires complex algorithms and huge expert knowledge for it's projects.

However, in the recent times, it has been quite easy to develop them as most of the standard algorithms and models are made available on the Internet in the form of web-applications. In this project, we will be using Teachable Machine, along with Tensorflow.js to control the basic actions in Google Meet such as muting, switching off your video and a special action, which will be revealed in the later course of the article.

Screenshot_77

Teachable-Machine

Teachable Machine is an online web-application, created by Google Creative Labs that facilitates easy and fast creation of machine learning models that is accessible to everyone on the net. It is flexible as it makes use of pre-existing files or even capture live examples, which will be recorded as instances, on the basis of which references will be made. You can even make a choice to use your models entirely on-device, without any webcam or microphone data being conveyed outside your system.
The advantage is that you train your computer to recognize your images, sounds, and poses of your personal choice, without writing any complicated and long machine learning code. Then, you can use your model in your own projects, applications, websites and even more. It is powered by Tensorflow.js, which is an open-source library, created by Google for the purpose of Machine-Learning.

Prerequisites

Well, at the most, what you require is the basic knowledge of Vanilla JavaScript, a webcam, an Internet connection and the passion to learn something new. Then, follow the below steps and get your own functional model developed in front of your eyes.

The Project

The project is in the form of a chrome browser-extension, which basically controls all the actions on Google Meet. The article has been divided into the following subtopics to facilitate your understanding:

Step 1: Training your Model

The first step involves creating the basic models on Teachable Machine. The model can be either any image, body pose or sound of your choice. In this project, we have decided to utilize the basic image model for training our computer. However, you can also use body poses and speech to train your model otherwise. Make sure the images, poses or sounds that you wish to implement are quite unnatural and can be performed accurately.

Now, to start with, head over to the Teachable Machine site and click on Get Started. You can further access it from the hamburger menu at the top-left corner of the homepage. You will get the following page:
Screenshot_2021-08-07_15-45-06

You will now get the options for creating your model online to train your machine. There are three different projects namely Image Project, Audio Project and Pose Project. For our model, we have used the Pose Project for it's implementation. You can also import your previously saved models from your Google Drive or local file system if you want to use a model which was created beforehand and you wish to reuse and modify it. If you want a new model, just click on Pose Project. Then, the site will redirect you to your project destination:
abc

First, you have to record your images in the form of "classes". The images can be captured live via webcam, or can be uploaded from the local file system as well. The image samples of a particular class should be same and repetitive in nature, while the image samples used in various classes should be different so that the class can be defined more accurately and can be recognized with ease.

It is advisable to record at least 50 to 60 images for increasing the accuracy threshold of the recorded class. In order to record live instances, grant permission to your webcam and press the Hold to Record button to capture your live images.

If you want a free tutorial on how to record your model in the form of images, you can head over to the Teachable Machine Tutorial and click on "Let's Go" for a live demo practice.

You can change the labels of the classes according to your wish. In this project, we have made use of three classes namely: Audio,Video and Escape, which are recorded live via webcam. The images in these classes can be specific hand gestures ,facial expressions or particular objects, which can be identified and recognized by the machine easily.
In order to add more than two classes, click on Add Class. After recording the image samples for the respective classes, it is time to train and export your model.

Step 2: Exporting your Model

Screenshot_2021-08-07_16-59-46

After, you have created your respective classes, click on Train Model, which will train your machine to recognize the content of the images, which will be used in our upcoming projects. Teachable Machine mainly uses pattern recognition algorithms and involves the usage of:

  • Statistical Techniques
  • Structural Techniques
  • Template Matching
  • Neural Network Approach
  • Fuzzy Model
  • Hybrid Models

After your model has been trained, you can preview your model before exporting it. This helps you to test the model before implementing it in your projects. Just perform the poses according to the images set by you and check whether the machine can identify them or not.
ezgif.com-gif-maker

When your model has been prepared, click on Export Model and export the model in the form of Tensorflow.js. Click on Upload my model, which will publish your model online and it will generate an URL , where your model will be hosted by Teachable Machine for free. You can also download it in the form of Tensorflow or Tensorflow Lite for local uses.

The model used in the project is live at: Project Model

Step 3: Preparing the Manifest.json

The next step involves creating a Chrome browser extension to render the following model to the Google Meet URL. So, we have to create the manifest.json file in the browser directory which will be loaded in the browser. So let's have a look at the json file:

{
    "name": "Gmeet_Controller",
    "description": "An extension to control Google Meet Actions using hand/facial gestures.",
    "permissions": ["activeTab", "storage", "tabs", "notifications"],
    "version": "1.0",
    "manifest_version": 3,
    "content_scripts": [{
        "matches": [
            "https://meet.google.com/*"
        ],
        "js": [
            "src/tf.min.js",
            "src/teachablemachine-pose.min.js",
            "src/background.js"
        ]
    }]
}
Enter fullscreen mode Exit fullscreen mode

You can set the name, description and version according to your choice, which is basically the information which will be displayed when you will load the extension on to the browser. You can also store icons in a separate directory in your extension directory and render them on to the browser. The URL needs to be specified in "matches" under content scripts. You can set permissions as well, if you do not want the extension to have unnecessary access to your system storage, notifications etc.

Step 4: Linking your models to Google Meet

Now create a separate directory to store the javascript files (here src), which will render the functionality of the extension. Download the latest and updated version of tf.min.js and teachablemachine-pose.min.js, and place them in the directory.

Note: This project can also be accomplished using npm and yarn packages such as "@teachablemachine/pose" and "@tensorflow/tfjs" or using their CDNs. However, we have used Vanilla JS to make the project simple and beginner-friendly.

After importing, it is now time to frame the background.js. First, declare a constant variable named URL which will contain the model URL that was generated by Teachable Machine.

 const URL = "https://teachablemachine.withgoogle.com/models/<MODEL_ID>/";
// Use your own personalized model here
Enter fullscreen mode Exit fullscreen mode

Then you have to write the basic Javascript to adjust the frame height and width of the webcam, request access from the webcam and keep on updating the webcam frame.

let model, webcam, ctx, labelContainer, maxPredictions;

async function init() {
    const modelURL = URL + "model.json";
    const metadataURL = URL + "metadata.json";

    model = await tmPose.load(modelURL, metadataURL);
    maxPredictions = model.getTotalClasses();

    const size = 200;
    const flip = true; 
    webcam = new tmPose.Webcam(size, size, flip); 
    await webcam.setup(); 
    await webcam.play();
    window.requestAnimationFrame(loop);

    document.getElementById("webcam-container").appendChild(webcam.canvas);
}

async function loop(timestamp) {
    webcam.update();
    await predict();
    window.requestAnimationFrame(loop);
}
Enter fullscreen mode Exit fullscreen mode

Now we have to write the functions to identify the event on the webcam, compare it with the classes of the models and if the event's probability is more than the threshold probability then the function gets executed.


function Audio(probability) {
    if (probability >= 1) {
        const audioButton = document.querySelectorAll(".VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.tWDL4c.uaILN")[0];
        if (audioButton.classList.contains("HNeRed")) {
            audioButton.click();
        }
    }
}

function Video(probability) {
    if (probability >= 1) {
        const videoButton = document.querySelectorAll(".VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.tWDL4c.uaILN")[1];
        if (videoButton.classList.contains("HNeRed")) {
            videoButton.click();
        }
    }
}

function Escape(probability) {
    if (probability >= 1) {
        const Button0 = document.querySelectorAll(".VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.tWDL4c.uaILN")[0];
        const Button1 = document.querySelectorAll(".VfPpkd-Bz112c-LgbsSe.yHy1rc.eT1oJ.tWDL4c.uaILN")[1];
        if (Button0.classList.contains("HNeRed")) {
            Button0.click();
        }
        if (Button1.classList.contains("HNeRed")) {
            Button1.click();
        }
    }
}

async function predict() {
    const { pose, posenetOutput } = await model.estimatePose(webcam.canvas);
    const prediction = await model.predict(posenetOutput);
    var predictionsArray = prediction.map(function (o, i) {
        return { probability: o.probability.toFixed(2), event: o.className }
    })

    var i;
    var min = predictionsArray[0].probability
    var max = predictionsArray[0].probability
    var event = predictionsArray[0].className;
    var value;
    for (i = 1; i < predictionsArray.length; i++) {
        value = predictionsArray[i].probability
        if (value < min) min = value;
        if (value > max) max = value;
    }
    const index = predictionsArray.findIndex((list) => {
        return list.probability == max;
    })
    event = predictionsArray[index].event;

    if (event === "Audio") {
        Audio(max);
    } else if (event === "Video") {
        Video(max);
    } else if (event === "Escape"){
        Escape(max);
    }
}


const webcamContainer = document.createElement("div");
webcamContainer.id = "webcam-container";
document.body.appendChild(webcamContainer);

init();
Enter fullscreen mode Exit fullscreen mode

Open Google Chrome, and type "chrome://extensions", to navigate to the Extensions window. Now toggle on the Developer mode, and click on Load unpacked. Then open the file directory where your manifest.json is stored. The extension is now ready to work.

Now you can have your own personalized model working on your system!

Demo

Here you can have a short look at the working model of the project:


Resources

You can further refer to the following documentation and tutorials to know more about the libraries and technologies:

Credits: Arju S. Moon

You can find the github repository for the above project here:

GitHub logo RitabrataDas343 / GMeet_Controller

This is a browser extension used for controlling Google Meet using hand/facial expressions.


This article has been written and established by:

and
Hope you have found this article resourceful.

Have a go through the following links to know more about us and keep yourself updated with the latest stuff:

Linux Facebook Instagram LinkedIn

Do drop a like to the post and comment down below if you have liked the idea and are interested in exploring the domain of Machine-Learning along with us. Any kind of suggestions and propositions are appreciated.

May The Source Be With You! ๐Ÿงโค๏ธ


Top comments (1)

Collapse
 
rei__aab8810f603bfe7dea1 profile image
KuroNeko

This is amazing. May I know how do you bypass the CSP? When I tested in my own laptop, it shows like this. May I know how did you guys fix that?

"Uncaught EvalError: Refused to evaluate a string as JavaScript because 'unsafe-eval' is not an allowed source of script in the following Content Security Policy directive: "script-src 'self' 'wasm-unsafe-eval'".