There is a game called slither.io (Link) I enjoy playing. It’s kinda hard playing from my phone (curse my small hands and the huge phone) and playing it using the built-in touchpad of my laptop is not something comfortable either. This is how the idea of a virtual mouse came to my mind.
Controlling the mouse pointer with your gesture would need three parts. Firstly, you would need some sort of input method that would decide which direction the pointer should go. Secondly, the input needs to be interpreted and processed. And finally, the actual movement of the mouse pointer according to those inputs has to be executed. Hand gestures can be easily captured from the webcam of my laptop. A very close friend recently used MediaPipe to interpret sign language from hand movement and I thought why don’t I try using that library too. And then for the movement of the mouse, a few google searches made me want to use pyautogui.
Now we start writing the code. I used pip to install the above libraries according to the documentation. After installing, the first thing we do is to import them.
import cv2
import mediapipe as mp
import pyautogui
Taking input from the webcam:
The first step is simple. We gotta capture the video from the webcam.
video = cv2.VideoCapture(0)
Understanding MediaPipe:
Now we gotta use this video to detect movement. I wanted my mouse pointer to follow the direction of my index finger. MediaPipe is a great option to identify these human gestures. In order to use this library, we need to understand how the library is identifying the gestures. MediaPipe can detect movements of your eyes or hands or your posture in general by identifying some important points of your body. These points are called landmarks. Let’s have a look at the landmarks of our hands.
So you can see, if we want to track the motion of our index finger, we need to find out what the landmark 8 is doing at that moment. It might look a bit complex, but the library already has the facilities of identifying the points done for you. You can create a hand object from the ‘Hands’ class of this library and use it to analyse your movements like this:
handGesture = mp.solutions.hands.Hands()
You might also need the drawing utilities from MediaPipe if you wanna draw the landmarks of your hand on the output screen.
drawingTools = mp.solutions.drawing_utils
The ‘loop’:
Now as you are taking input from the webcam, you have to process the input over and over for every frame, you’d need a continuous loop. It might look somewhat like this:
while True:
<read the captured video>
<format the video>
<do something to detect the landmark 8>
<move the mouse pointer according to the movements of landmark 8>
cv2.imshow('Virtual Mouse', frame)
cv2.waitKey(1)
Here the imshow creates a window to show your video output, and waitKey introduces a delay of 1 ms to let the window respond to the actions.
Reading and formatting the video:
You’ve already got the ‘video’ variable where you kept your input video from the webcam.
while True:
_, frame = video.read()
frame = cv2.flip(frame, 1)
rgbConvertedFrame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
<do something to detect the landmark 8>
<move the mouse pointer according to the movements of landmark 8>
cv2.imshow('Virtual Mouse', frame)
cv2.waitKey(1)
The video.read() gives us two things, a boolean that says if the reading was successful, and the actual frame data. I don’t particularly wanna do anything with the boolean, therefore it goes to the ‘_’ placeholder. I found the video got flipped in the output window therefore added the cv2.flip() function to horizontally flip the image back. This frame contains three
attributes, height, weight and number of channels. In this part I added cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) to convert the frame from BGR to RGB. My code worked without this one but it is recommended to use this process if you are using MediaPipe with cv2 as these libraries by default use different colour spaces.
Finding the landmark 8:
Now the frame we got from the RGB conversion needs to be analyzed to find the index finger.
while True:
_, frame = video.read()
frame = cv2.flip(frame, 1)
rgbConvertedFrame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
output = handGesture.process(rgbConvertedFrame)
hands = output.multi_hand_landmarks
if hands:
for hand in hands:
drawingTools.draw_landmarks(frame, hand)
landmarks = hand.landmark
for id, landmark in enumerate(landmarks):
if id == 8:
<move the mouse pointer>
cv2.imshow('Virtual Mouse', frame)
cv2.waitKey(1)
We have to process the RGB converted video frame to find the landmark 8 using the handGesture object of the Hands class we declared earlier. Here multi_hand_landmarks lets you get all the landmarks of your hand from the video and we have that stored in the hands variable. Now we draw all the landmarks we’ve got inside this ‘hands’ variable using draw_landmarks just to get a better view on the output screen. However, we’re gonna work with the landmark 8 only in our current case. So the code of the mouse binding will execute only when we find the landmark 8.
Moving the mouse pointer:
When we can find the landmark 8 aka your index finger, we have to locate its position and somehow bind it with the mouse. Say if the tip of my index finger is at the center of the frame right now and I moved it to the right, the mouse pointed needs to move to the right too. We can easily do that using pyautogui.moveTo(mousePositionX, mousePositionY) function. My first intuition was:
if id == 8:
x = int(landmark.x*frameWidth)
y = int(landmark.y*frameHeight)
cv2.circle(img=frame, center=(x,y), radius=30, color=(0, 255, 255))
pyautogui.moveTo(mousePositionX, mousePositionY)
However, the dimension of the frame you are capturing using the camera and the dimension of the screen of your computer might not be the same. In my case, my mouse pointer was moving, but crashing within seconds and I couldn’t understand why. I needed to scale the values of x and y within the screen.
screenWidth, screenHeight = pyautogui.size()
pyautogui.size() gives you the size of the screen, and
frameHeight, frameWidth, _ = frame.shape
Frame.shape gives you the size of the frame.
Combining these two, it became something like this:
if id == 8:
x = int(landmark.x*frameWidth)
y = int(landmark.y*frameHeight)
cv2.circle(img=frame, center=(x,y), radius=30, color=(0, 255, 255))
mousePositionX = screenWidth/frameWidth*x
mousePositionY = screenHeight/frameHeight*y
pyautogui.moveTo(mousePositionX, mousePositionY)
And the final code looks like this:
import cv2
import mediapipe as mp
import pyautogui
video = cv2.VideoCapture(0)
handGesture = mp.solutions.hands.Hands()
drawingTools = mp.solutions.drawing_utils
screenWidth, screenHeight = pyautogui.size()
while True:
_, frame = video.read()
frame = cv2.flip(frame, 1)
frameHeight, frameWidth, _ = frame.shape
rgbConvertedFrame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
output = handGesture.process(rgbConvertedFrame)
hands = output.multi_hand_landmarks
if hands:
for hand in hands:
drawingTools.draw_landmarks(frame, hand)
landmarks = hand.landmark
for id, landmark in enumerate(landmarks):
if id == 8:
x = int(landmark.x*frameWidth)
y = int(landmark.y*frameHeight)
cv2.circle(img=frame, center=(x,y), radius=30, color=(0, 255, 255))
mousePositionX = screenWidth/frameWidth*x
mousePositionY = screenHeight/frameHeight*y
pyautogui.moveTo(mousePositionX, mousePositionY)
cv2.imshow('Virtual Mouse', frame)
cv2.waitKey(1)
And the output was like:
Top comments (1)
What coding app did you use for this?