Human Pose Detection Based on MediaPipe Pose and Holistic with UNIHIKER

Introduction

This project aims to use an external USB camera connected to the UNIHIKER to recognize objects, locate human figures in the images, and display their poses.

 

Project Objectives

Learn how to use  MediaPipe Pose and Holistic modules for human pose detection.

HARDWARE LIST
1 UNIHIKER - IoT Python Single Board Computer with Touchscreen
1 Type-C&Micro 2-in-1 USB Cable
1 USB camera

Software: 

Mind+ Programming Software

 

Practical Process

1. Hardware Setup

Connect the camera to the USB port of UNIHIKER.

 

 

Connect the UNIHIKER board to the computer via USB cable.

 

 

2. Software Development

Step 1: Open Mind+, and remotely connect to UNIHIKER.

 

connect mind+  to the unihiker

 

Step 2: Find a folder named "AI" in the "Files in UNIHIKER". And create a folder named "Human Pose Detection Based on MediaPipe Pose and Holistic with UNIHIKER" in this folder. Import the dependency files for this lesson.

 

 

Step3: Create a new project file in the same directory as the above file and name it "main1.py".

Sample Program:

CODE
import cv2
import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_pose = mp.solutions.pose

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 320) 
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240) 
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1) 

cv2.namedWindow('MediaPipe Pose', cv2.WND_PROP_FULLSCREEN)
cv2.setWindowProperty('MediaPipe Pose', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)

with mp_pose.Pose(
        min_detection_confidence=0.5,
        model_complexity=0,  
        min_tracking_confidence=0.5  
) as pose:
    while cap.isOpened():  
        success, image = cap.read()  
        if not success:  
            print("Ignoring empty camera frame.")
            continue  

                image.flags.writeable = False
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  
        results = pose.process(image)  

        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)  
        mp_drawing.draw_landmarks(
            image,
            results.pose_landmarks,  
            mp_pose.POSE_CONNECTIONS,  
            landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style()  
        )
        
        image = cv2.rotate(image, cv2.ROTATE_90_CLOCKWISE)
        cv2.imshow('MediaPipe Pose', cv2.flip(image, 1))
        
        if cv2.waitKey(5) & 0xFF == 27:
            break

cap.release()
cv2.destroyAllWindows()

 

Step4: Create a new project file in the same directory as the above file and name it "main2.py".

Sample Program:

CODE
import cv2
import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_holistic = mp.solutions.holistic

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 320)  
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)  
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)  

cv2.namedWindow('MediaPipe Holistic', cv2.WND_PROP_FULLSCREEN)
cv2.setWindowProperty('MediaPipe Holistic', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)

with mp_holistic.Holistic(
        min_detection_confidence=0.5,  
        model_complexity=0,  
        min_tracking_confidence=0.5  
) as holistic:
    while cap.isOpened():  
        success, image = cap.read()  
        if not success:  
            print("Ignoring empty camera frame.")
            continue  

        image.flags.writeable = False
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) 
        results = holistic.process(image)  

        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)  
        
        mp_drawing.draw_landmarks(
            image,
            results.face_landmarks,  
            mp_holistic.FACEMESH_CONTOURS,  
            landmark_drawing_spec=None,
            connection_drawing_spec=mp_drawing_styles.get_default_face_mesh_contours_style()  
        )
        
        mp_drawing.draw_landmarks(
            image,
            results.pose_landmarks,  
            mp_holistic.POSE_CONNECTIONS,  
            landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style()  
        )
        
        image = cv2.rotate(image, cv2.ROTATE_90_CLOCKWISE)
        cv2.imshow('MediaPipe Holistic', cv2.flip(image, 1))
        
        if cv2.waitKey(5) & 0xFF == 27:
            break

cap.release()
cv2.destroyAllWindows()

 

3. Run and Debug

Step 1: Execute the "1-Install_dependency.py" program file and wait for the automatic installation of dependencies.

 

 

Step 2: Run the main program 1  
Run the "main1.py" program, and you will see the real-time image captured by the camera displayed on the screen. Point the camera at a person, and you will see their pose detected.

 

 

Step 3: Run the main program 2  
Run the "main2.py" program, and you will see the real-time image captured by the camera displayed on the screen. Point the camera at a person, and you will see their face, hands, and pose all detected.

 

 

4. Program Analysis  

Analysis of main program 1:

In the "main1.py" file mentioned above, we mainly use the OpenCV library to access the camera, reading images from the camera in real-time, and then use the MediaPipe library's pose module for pose recognition and rendering. The recognized pose information is annotated on the image and displayed in the window. The overall process is as follows:

Initialization: Import the required libraries and initialize the MediaPipe drawing and pose recognition modules. Open the camera and set its parameters, creating a full-screen display window.  
Define Functions: Define a function for drawing Chinese text on the image (not used in this program, but the function definition is retained).  
Main Loop: The program enters an infinite loop, where in each iteration it performs the following operations:  
- Read a frame from the camera; if reading fails, skip the current frame.  
- Mark the image as non-writable to improve performance, and convert the image color space from BGR to RGB.  
- Use the MediaPipe Pose module to process the image and obtain pose information.  
- Mark the image as writable and convert the color space back to BGR.  
- Draw the pose annotation information on the image.  
- Rotate the image 90 degrees clockwise and flip it horizontally for a selfie view display.  
- Display the processed image in the window. Exit the loop if the Esc key is pressed.  
Ending: When the main loop ends, release the camera device and close all OpenCV windows.

 

Analysis of main program 2:

In the "main2.py" file mentioned above, we primarily use the OpenCV library to access the camera, reading images from the camera in real-time, and then use the MediaPipe library's holistic module for face and pose recognition and rendering. The recognized face and pose information is annotated on the image and displayed in the window. The overall process is as follows:

Initialization: Import the required libraries and initialize the MediaPipe drawing and holistic model modules. Open the camera and set its parameters, creating a full-screen display window.  
Main Loop: The program enters an infinite loop, where in each iteration it performs the following operations:  
- Read a frame from the camera; if reading fails, skip the current frame.  
- Mark the image as non-writable to improve performance, and convert the image color space from BGR to RGB.  
- Use the MediaPipe Holistic module to process the image and obtain face and pose information.  
- Mark the image as writable and convert the color space back to BGR.  
- Draw the landmark annotation information for the face and pose on the image.  
- Rotate the image 90 degrees clockwise and flip it horizontally for a selfie view display.  
- Display the processed image in the window. Exit the loop if the Esc key is pressed.  
Ending: When the main loop ends, release the camera device and close all OpenCV windows.

 

Similarities:

① Library Import:
- Both programs import the OpenCV and MediaPipe libraries for image processing and pose detection.  

② Camera Settings:
- Both programs use OpenCV to open the camera and set the camera's resolution and buffer size.  

③ Full-Screen Display:  
- Both programs create a full-screen window to display the processed images.  

④ Main Loop:  
- Both programs contain a main loop that reads frames from the camera, processes the images, and displays them in the window.  
- In case of read failure, both programs skip the current frame and continue looping.  
- Both detect key events and exit the loop if the Esc key is pressed.  

⑤ Image Processing: 
- Both programs convert the image from BGR to RGB and then back to BGR to ensure compatibility with MediaPipe processing.  
- Both programs use MediaPipe modules to process the images and draw corresponding landmark annotations.  

⑥ Image Flipping:  
- Both programs rotate the image 90 degrees clockwise and flip it horizontally before displaying it for a selfie view.  

 

Differences: 

① MediaPipe Modules Used:
- The first program uses the `mp.solutions.pose` module for pose recognition.  
- The second program uses the `mp.solutions.holistic` module for holistic recognition, which includes face, pose, and hand detection.  

② Landmark Drawing:
- The first program only draws pose landmarks.  
- The second program draws both face and pose landmarks, including the facial mesh outline.  ③ Variable and Module Names:  
- In the first program, the MediaPipe module variable is named `pose`, while in the second program, it is named `holistic`.  

④ Model Complexity:  
- Both programs set the model complexity to 0, but the specific complexity settings may differ due to the different MediaPipe modules used.

Knowledge Corner - MediaPipe Pose and Holistic module

Introduction to the MediaPipe Library 

MediaPipe is a cross-platform, multi-modal machine learning framework developed by Google, primarily used for real-time processing and analysis of video streams. It provides a range of high-performance machine learning models and tools that are widely applied in computer vision tasks, such as gesture recognition, face detection, and pose estimation.

 

MediaPipe Pose Module

Overview

MediaPipe Pose is a high-performance human pose estimation solution that can detect and track skeletal key points of the human body in real-time. It uses convolutional neural networks (CNN) and deep learning techniques to recognize human poses and render the skeletal structure of the body.

 

Features  
- Key Point Detection: Able to detect 33 key points on the human body, including the head, shoulders, elbows, wrists, hips, knees, and ankles.  
- Real-time Performance: Efficient algorithms enable pose estimation in real-time video streams.  
- Platform Support: Supports multiple platforms, including desktop and mobile devices.

 

MediaPipe Holistic Module  

Overview  
MediaPipe Holistic is a comprehensive solution that can simultaneously detect and track face, hand, and body poses. It combines MediaPipe's Face Mesh, Hands, and Pose modules to provide an integrated multi-modal recognition system.

 

Features  
- Facial Mesh: Detects and tracks 468 key points on the face for detailed facial feature analysis.  
- Hand Key Points: Detects 21 key points for each hand for gesture recognition and tracking.  
- Body Pose: Detects 33 key points on the human body for pose estimation.  
- Multi-modal Integration: Processes and combines data from face, hand, and pose detection simultaneously, providing richer information.

 

Code: https://drive.google.com/file/d/1vQ2xb2-u5C0b4LsQygqTITrXTeEtWYNL/view?usp=drive_link

License
All Rights
Reserved
licensBg
0