Gesture recognition and tracking | Mediapipe Hands x UNIHIKER


This project is to connect an external USB camera on UNIHIKER and use the camera to recognize the hands and draw keypoints on it.


Project Objectives

Learn how to use MediaPipe's Hands model for gesture recognition and draw key points.

1 UNIHIKER - IoT Python Single Board Computer with Touchscreen
1 Type-C&Micro 2-in-1 USB Cable
1 USB camera


Mind+ Programming Software


Practical Process

1. Hardware Setup

Connect the camera to the USB port of Unihiker.


Connect the UniHiker board to the computer via USB cable.


2. Software Development

Step 1: Open Mind+, and remotely connect to Unihiker.


Step 2: Find a folder named "AI" in the "Files in UNIHIKER". And create a folder named "Gesture recognition and tracking based on Mediapipe and UNIHIKER" in this folder. Create a new project file and name it "".

Sample Program:

icon 1KB Download(3)
import cv2    
import mediapipe as mp    
mp_drawing =    
mp_drawing_styles =    
mp_hands =    
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 320)    
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)    
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)    
cv2.namedWindow('MediaPipe Hands', cv2.WND_PROP_FULLSCREEN)
cv2.setWindowProperty('MediaPipe Hands', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)
with mp_hands.Hands(
    min_tracking_confidence=0.5) as hands:
  while cap.isOpened():
    success, image =
    if not success:
        print("Ignoring empty camera frame.")
    image.flags.writeable = False
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    results = hands.process(image)
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
    image = cv2.rotate(image, cv2.ROTATE_90_CLOCKWISE)
    cv2.imshow('MediaPipe Hands', cv2.flip(image, 1))
    if cv2.waitKey(5) & 0xFF == 27:


3. Run and Debug

Step 1: Run the main program

Run the "" program, you can see that the initial screen shows the real-time image captured by the camera. Aiming the camera image at the hand, you can see that the palm is recognized and key points are marked.



4. Program Analysis

In the above "" file, we mainly use OpenCV library to call the camera to get the real-time video stream, and then use MediaPipe's Hands model to recognize gestures and draw keypoints for each frame of the image. The overall process is as follows.

1. Turn on the camera: The program first turns on the camera to get the real-time video stream.

2. Read each frame: In an infinite loop, the program reads each frame from the camera.

3. Image preprocessing: The captured image is first flipped (because images captured by the camera are usually mirrored), and then converted to RGB format (because images read by OpenCV are in BGR format by default, while most image processing and computer vision algorithms assume input images are in RGB format).

4. Apply Hands model: The preprocessed image is input into the Hands model, and the model outputs a list containing the coordinates of keypoints of multiple hands.

5. Draw key points and connecting lines: If a hand is detected in the image, the program draws the key points and connecting lines of the hand.

6. Display the processed image: display the image with keypoints and connection lines.

7. Check user input: if the user presses the ESC key, exit the loop and end the program.



MediaPipe Hands

MediaPipe's Hands model is a model for hand keypoint detection that is capable of recognizing 21 keypoints of multiple hands in an RGB image. These keypoints include the wrist, individual knuckles, etc. and can be used to characterize hand posture and gestures.

The Hands model uses a regression-based approach to achieve hand keypoint detection by predicting the coordinate positions of the keypoints of the hand in the image. The input to the model is an RGB image and the output is the coordinates of 21 key points of each hand in the image.

The structure of the model is based on a model known as BlazePalm. The model performs hand keypoint detection while also predicting a hand orientation vector, which is useful for distinguishing between left and right hands and hand orientation. The model is trained using a large number of real-world images as well as synthetic images, and is able to handle a variety of complex scenarios, including different lighting conditions, hand occlusions, etc.

MediaPipe provides complete model files and related APIs so that developers can easily integrate the Hands model in their applications. Moreover, MediaPipe also provides a complete set of solutions, including hand tracking, gesture recognition and other functions, which can help developers more easily develop complex interactive applications.


Feel free to join our UNIHIKER Discord community! You can engage in more discussions and share your insights!

All Rights