Introduction
I recently worked with my teammates on a college project to develop a fitness app using machine learning. So I would like to leave some notes on the technical points.
The app we developed has been deployed here. I was responsible for implementing the dashboard and workout screens/features.
App URL: https://www.bodybuddy.me
GitHub: https://github.com/vinsouza99/BodyBuddy.git
What I have learned
MediaPipe offers multiple pre-trained models for image processing. The pose detection model used in this project takes webcam footage as input and outputs the coordinates of 33 body landmarks, such as shoulders, knees, elbows, and heels. This model is also available as a JavaScript library and can be executed directly in the browser.
It is crucial to appropriately define which landmarks to track and the conditions under which a count is triggered to ensure a seamless user experience. Additionally, detection can become unstable if parts of the body go out of frame or if the user is too close to the camera, causing significant fluctuations in the coordinates. To address this, it is necessary to design mechanisms that account for such scenarios and prevent miscounts.
Furthermore, pose detection consumes substantial computational resources. For example, while it ran smoothly on a recent Mac, some stuttering was observed on slightly older Windows PCs. Therefore, it is recommended to clearly define the supported environments for reliable operation.
Technology Stack
- Javascript
- React
- Chart.js
- MediaPipe
- Node.js
- Express.js
- Sequalize
- Supabase (Authentication, Storage, Postgres)
Demo
Code Description
Below is an overview of the processes performed during the workout screen execution. The flow is quite simple: the system uses the webcam feed as input and employs MediaPipe's machine-learning model to detect posture in real-time. Specifically, the MediaPipe model estimates the coordinates of 33 body landmarks (such as shoulders and knees) from the video feed.
Next, these coordinates are overlaid on a canvas, with the landmarks visualized as dots and lines. Furthermore, logic is executed to calculate angles formed by specific landmarks or to measure the distance travelled by certain points. Based on these calculations, the system increments the count for movements (e.g., squats or push-ups) or sends alerts for incorrect posture.
This entire process is repeatedly executed frame by frame using requestAnimationFrame, enabling real-time analysis of movements.
Here are some key sections of the code, explained in a bit detail.
import {
PoseLandmarker,
FilesetResolver,
DrawingUtils,
} from "@mediapipe/tasks-vision";
This code imports essential modules from @mediapipe/tasks-vision to enable pose detection and visualization. The PoseLandmarker is used to detect 33 body landmarks from images or videos in real-time. The FilesetResolver manages and loads necessary resources, such as models and WASM files, required for MediaPipe tasks. Lastly, DrawingUtils provides functionality to render the detected landmarks and their connections onto a 2D canvas for visualization.
useEffect( () =>{
const createPoseLandmarker = async () => {
try {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0/wasm"
);
const poseLandmarker = await PoseLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetPath: "https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task",
delegate: "GPU",
},
runningMode: "VIDEO",
numPoses: 1,
});
poseLandmarkerRef.current = poseLandmarker;
} catch (error) {
console.error("Error loading PoseLandmarker:", error);
}
};
createPoseLandmarker();
});
Within useEffect, call createPoseLandmarker, which loads the required resources using FilesetResolver and creates a PoseLandmarker instance using the vision resource and the specified options. The PoseLandmarker instance includes methods for processing video or webcam frames in real-time. The created instance is stored in poseLandmarkerRef and can be used in subsequent processes.
const predictPosture = async () => {
if (!poseLandmarkerRef.current) {
console.error("PoseLandmarker is not initialized.");
return;
}
const videoElement = videoRef.current;
const canvasElement = canvasRef.current;
const canvasCtx = canvasElement.getContext("2d");
const drawingUtils = new DrawingUtils(canvasCtx);
canvasElement.width = videoElement.videoWidth;
canvasElement.height = videoElement.videoHeight;
clearCanvas();
try {
const results = await poseLandmarkerRef.current.detectForVideo(
videoElement,
performance.now()
);
if (!results?.landmarks?.length) return;
drawingUtils.drawConnectors(
results.landmarks[0],
PoseLandmarker.POSE_CONNECTIONS,
{ color: "#00FF00", lineWidth: 2 }
);
drawingUtils.drawLandmarks(
results.landmarks[0],
{ radius: 2, color: "#FF0000" }
);
const {count = 0, alert = "", calorie = 0, score = 0 } =
exerciseCounter?.processPose(results.landmarks[0]) || {};
// *** The actual code is omitted as it is very long. ***
// Add some processing using exerciseCounter's response
// For example:
// - Displaying count, calories burned, score earned
// - Showing alert when user's posture is not proper
} catch (error) {
console.error("Error during pose detection:", error);
}
if (webcamRunning) {
animationFrameIdRef.current = window.requestAnimationFrame(predictPosture);
}
};
This code represents the main part of posture detection. It is an asynchronous function that detects posture in real-time from webcam footage and performs additional processing based on the detected landmarks. First, it checks if the PoseLandmarker required for posture detection has been initialized. If it is not initialized, an error is logged, and the process is terminated. Next, it sets up the webcam feed and canvas, adjusts the canvas size based on the video dimensions, and clears any previous drawings.
The function then uses the detectForVideo method to perform posture detection and retrieve the detected landmarks. You can get the 33 landmark data as the return value of detectForVideo. These landmarks are used to draw points and connections on the canvas, visually representing the body’s shape. Additionally, although omitted in the code above, it performs extra processing based on landmarks, such as counting movements, identifying errors in posture, and calculating metrics like calories burned and scores.
If an error occurs during posture detection, it is logged into the console to ensure the process continues safely. Finally, the function utilizes requestAnimationFrame to repeatedly call itself, enabling continuous real-time processing of video frames. This workflow allows for real-time posture detection and feedback based on webcam footage.
[
[
{
"x": 0.4226442873477936,
"y": 0.24005821347236633,
"z": -0.21633145213127136,
"visibility": 0.9985895752906799
},
{
"x": 0.4282989799976349,
"y": 0.22500008344650269,
"z": -0.1938890814781189,
"visibility": 0.9979572296142578
},
{
"x": 0.4316709041595459,
"y": 0.2257886528968811,
"z": -0.19421276450157166,
"visibility": 0.9979007244110107
},
{
"x": 0.4349389374256134,
"y": 0.22702443599700928,
"z": -0.1943206638097763,
"visibility": 0.9980199337005615
},
{
"x": 0.41542500257492065,
"y": 0.22407883405685425,
"z": -0.20759187638759613,
"visibility": 0.9975939393043518
},
{
"x": 0.40978389978408813,
"y": 0.22423318028450012,
"z": -0.20759187638759613,
"visibility": 0.9974191188812256
},
{
"x": 0.40484046936035156,
"y": 0.22473159432411194,
"z": -0.2076997607946396,
"visibility": 0.9975274205207825
},
{
"x": 0.432990700006485,
"y": 0.24122315645217896,
"z": -0.07725352048873901,
"visibility": 0.9970536231994629
},
{
"x": 0.39492490887641907,
"y": 0.2370660901069641,
"z": -0.13616472482681274,
"visibility": 0.9975370168685913
},
{
"x": 0.4275425970554352,
"y": 0.2611766755580902,
"z": -0.1707993447780609,
"visibility": 0.9989877343177795
},
{
"x": 0.41113120317459106,
"y": 0.2596157491207123,
"z": -0.18806269764900208,
"visibility": 0.9990490078926086
},
{
"x": 0.4555538594722748,
"y": 0.3702540993690491,
"z": 0.013790455646812916,
"visibility": 0.9987744688987732
},
{
"x": 0.3487405776977539,
"y": 0.3495810627937317,
"z": -0.14501219987869263,
"visibility": 0.9992875456809998
},
{
"x": 0.46265295147895813,
"y": 0.5064929723739624,
"z": 0.07860222458839417,
"visibility": 0.774071455001831
},
{
"x": 0.3323880136013031,
"y": 0.49474000930786133,
"z": -0.1699361801147461,
"visibility": 0.9865716695785522
},
{
"x": 0.4919297993183136,
"y": 0.5954675674438477,
"z": 0.0021107152570039034,
"visibility": 0.8174288272857666
},
{
"x": 0.4082949161529541,
"y": 0.5761870741844177,
"z": -0.22895526885986328,
"visibility": 0.9741412401199341
},
{
"x": 0.5029866695404053,
"y": 0.6259068250656128,
"z": -0.012617086060345173,
"visibility": 0.7789857983589172
},
{
"x": 0.4148033857345581,
"y": 0.6108712553977966,
"z": -0.27060312032699585,
"visibility": 0.9540517926216125
},
{
"x": 0.5001327991485596,
"y": 0.6205637454986572,
"z": -0.05518879368901253,
"visibility": 0.7929456830024719
},
{
"x": 0.4252701997756958,
"y": 0.5970830917358398,
"z": -0.26628729701042175,
"visibility": 0.9539660811424255
},
{
"x": 0.4919487535953522,
"y": 0.6099179983139038,
"z": -0.016885722056031227,
"visibility": 0.8022459149360657
},
{
"x": 0.4270872175693512,
"y": 0.5896612405776978,
"z": -0.22722893953323364,
"visibility": 0.94629967212677
},
{
"x": 0.4176270067691803,
"y": 0.6360388398170471,
"z": 0.04059586301445961,
"visibility": 0.9976218342781067
},
{
"x": 0.3522874414920807,
"y": 0.6387805938720703,
"z": -0.04035309702157974,
"visibility": 0.9984330534934998
},
{
"x": 0.4253230690956116,
"y": 0.8206213712692261,
"z": 0.061069127172231674,
"visibility": 0.8570517301559448
},
{
"x": 0.36017337441444397,
"y": 0.8413680791854858,
"z": -0.017398227006196976,
"visibility": 0.9662834405899048
},
{
"x": 0.4192449450492859,
"y": 0.9211388826370239,
"z": 0.2934770882129669,
"visibility": 0.6253237724304199
},
{
"x": 0.3750242292881012,
"y": 1.0136358737945557,
"z": 0.12494354695081711,
"visibility": 0.8725647330284119
},
{
"x": 0.41356179118156433,
"y": 0.9699124097824097,
"z": 0.3161352276802063,
"visibility": 0.3453260064125061
},
{
"x": 0.3750942647457123,
"y": 1.0250861644744873,
"z": 0.13540945947170258,
"visibility": 0.4877343773841858
},
{
"x": 0.44083380699157715,
"y": 1.0195741653442383,
"z": 0.24557125568389893,
"visibility": 0.6147988438606262
},
{
"x": 0.3841099143028259,
"y": 1.0800800323486328,
"z": 0.014930106699466705,
"visibility": 0.788257360458374
}
]
]
The value that we can find in result.landmark is like the above form. There will be 33 points of dataset and those can be mapped as below.
Image from MediaPipe official page
const loadExerciseCounter = (selectedExercise) => {
if (!selectedExercise) return null;
const CounterClass = exerciseCounterLoader[selectedExercise.exercise_id];
if (!CounterClass) {
console.error(
"Exercise counter is not implemented for:",
selectedExercise.name
);
return null;
}
return new CounterClass();
};
Regarding the exercise counter class, the app dynamically loads the appropriate counter class for the exercise selected by a user. A base baseCounter class is prepared and specific exercises, such as squats or push-ups, implement their own logic by extending the baseCounter class. This design makes it easy to add new exercises.
const TOP_ANGLE_THRESHOLD = 170;
const BOTTOM_ANGLE_THRESHOLD = 100;
#processCount(leftShoulder, leftHip, leftKnee, leftAnkle) {
this.shoulderHipKneeAngle = calculateAngle(leftShoulder, leftHip, leftKnee);
this.hipKneeAnkleAngle = calculateAngle(leftHip, leftKnee, leftAnkle);
// Judge Squat Up (top position)
if (this.shoulderHipKneeAngle > TOP_ANGLE_THRESHOLD && !this.up) {
this.up = true;
this.down = false;
console.log("Top position reached.");
}
// Judge Squat Down (bottom position)
if (this.up && this.hipKneeAnkleAngle < BOTTOM_ANGLE_THRESHOLD && !this.down) {
this.down = true;
this.successCount += 1;
console.log("Bottom position reached. Count:", this.successCount);
}
// Reset Up state when user returns to the top position
if (this.down && this.shoulderHipKneeAngle > TOP_ANGLE_THRESHOLD) {
this.up = false;
console.log("Reset to start position.");
}
}
This is an excerpt from the main logic of the squat counter class. The method calculates the angles formed by key body landmarks (shoulder, hip, knee, and ankle) to determine the user's position and count the repetitions.
The logic starts by calculating two critical angles: the shoulder-hip-knee angle and the hip-knee-ankle angle. These angles are used to detect the top and bottom positions of the squat. When the shoulder-hip-knee angle exceeds the defined TOP_ANGLE_THRESHOLD (170 degrees), and the user is not already marked as "up," the system identifies the user as having reached the top position. At this point, it resets the "down" state to prepare for tracking the downward motion.
Next, the system detects the bottom position when the hip-knee-ankle angle drops below the BOTTOM_ANGLE_THRESHOLD (100 degrees) while the user is marked as "up." Upon reaching this position, the system increments the squat count, logs the event, and marks the user as "down."
Finally, when the user returns to the top position with a shoulder-hip-knee angle exceeding the TOP_ANGLE_THRESHOLD after being marked as "down," the system resets the "up" state, allowing the cycle to repeat for the next squat.
export const calculateAngle = (a, b, c) => {
const radians =
Math.atan2(c.y - b.y, c.x - b.x) - Math.atan2(a.y - b.y, a.x - b.x);
let angle = Math.abs((radians * 180.0) / Math.PI);
if (angle > 180.0) angle = 360 - angle;
return angle;
};
The angle calculation is performed by the above code.
_isLandmarkUnstable(landmark) {
if (!this.previousLandmarks) {
this.previousLandmarks = landmark;
return false; // First frame, no comparison possible
}
let unstable = false;
landmark.forEach((landmark, index) => {
const prevLandmark = this.previousLandmarks[index];
const dx = landmark.x - prevLandmark.x;
const dy = landmark.y - prevLandmark.y;
const distance = Math.sqrt(dx * dx + dy * dy);
if (distance > this.movementThreshold) {
unstable = true;
}
});
// Update previous landmarks for the next comparison
this.previousLandmarks = landmark;
return unstable;
}
Lastly, this is a mechanism to prevent false detections. MediaPipe has an issue where landmark coordinates can become significantly unstable if the user is too close to the camera or if parts of their body are outside the frame. To avoid incorrect counting under such conditions, I added the logic to ignore results when coordinates move too quickly or excessively.
That's all for this blog post. This time I used the pre-trained models provided by MediaPipe, but next time I would like to incorporate my own customised models into the application.
Top comments (0)