From Pose Recognition to Oracle Bones: Human Keypoints Detect with YOLOv8-Pose Model

1.Project Introduction

1.1 Project Overview

Do you want to learn about ancient Chinese characters and culture? Are you curious about the origin of oracle bone script and pictographic characters? We have created an intelligent pose recognizer using UNIHIKER K10! The accurate detection of human keypoints is achieved by using the YOLOv8-pose model to detect them, and combining the Mediapipe library to recognize human angles, positions and other keypoints in real time, you can clearly understand the origin of oracle bone inscriptions and pictographic characters.

The computer and the UNIHIKER K10 communicate via Siot. When facing the K10's camera and making a gesture, UNIHIKER K10 records your gesture and matches the most similar hieroglyph based on your posture. In addition, UNIHIKER K10's display will show the origin and meaning of the hieroglyph, providing a highly intelligent experience!

1.2 Project Functional Diagrams

1.3 Project Video

2 Materials List

2.1 Hardware List

HARDWARE LIST
1 UNIHIKER K10
1 USB Cable

2.2 Software

Mind+ Graphical Programming Software (Minimum Version Requirement: V1.8.1 RC1.0)

2.3 Basic Mind+ Software Usage Mind+

(1) Double click to open the Mind

The following screen will be called up.

Click and switch to offline mode.

(2) Load UNIHIKER K10

Based on the previous steps, then click on "Extensions" find the "UNIHIKER K10" module under the "Board" and click to add it. After clicking "Back" you can find the UNIHIKER K10 in the "Command Area" and complete the loading of UNIHIKER K10.

Then,you need to use a USB cable to connect UNIHIKER K10 to the computer.

Then, after clicking Connect Device, click 'COM-UNIHIKER K10' to connect.

Note: The device name of different UNIHIKER K10 may vary, but all end with K10.
In Windows 10/11, UNIHIKER K10 is driver-free. However, for Windows 7, manual driver installation is required: https://www.unihiker.com/wiki/K10/faq/#high-frequency-problem.
The next interface you see is the Mind+ programming interface. Let's see what this interface consists of.

Note: For a detailed description of each area of the Mind+ interface, see the Knowledge Hub section of this lesson.

3. Construction Steps

The project is divided into three main parts:
(1) Task 1: UNIHIKER K10 Networking and Webcam Activation
Connect UNIHIKER K10 through IoT communication and enable the webcam function to establish a visual communication channel with the computer for transmitting video data.
(2) Task 2: Visual Detection and Data Upload
The video frame information containing the human body captured by the UNIHIKER K10 camera is transmitted to the computer. The real-time video stream is processed using the computer's YOLOv8-pose model and MediaPipe library. Meanwhile, connect the UNIHIKER K10 to the MQTT platform and upload the detection results from the computer to the SIoT platform.
(3) Task 3: UNIHIKER K10 Receiving Results and Executing Control
UNIHIKER K10 remotely retrieves inference results from SIoT. Display the most similar oracle bone inscriptions on the screen according to different human body movements, along with introductions to the oracle bone inscriptions.

3.1 Task1: UNIHIKER K10 Networking and Webcam Activation

(1) Hardware Setup

Confirm that the UNIHIKER K10 is connected to the computer via a USB cable.

(2) Software Preparation

Make sure that Mind+ is opened and the UNIHIKER board has been successfully loaded. Once confirmed, you can proceed to write the project program.

(3)Write the Program

UNIHIKER K10 Network Connection and Open Webcam

To enable communication between the computer and UNIHIKER K10, first ensure both devices are connected to the same local network.

First,add MQTT communication and Wi-Fi modules from the extension library. Refer to the diagram for commands.

After UNIHIKER K10 is connected to the network, its camera and networking functions are required to transmit camera information to the local area network (LAN), allowing access from any computer on the LAN at any time. This enables the computer to call many existing computer vision libraries for image recognition. Therefore, next we need to load the webcam library to ensure that the information captured by UNIHIKER K10 can be transmitted to the computer.

Click on "Extensions" in the Mind+ toolbar, enter the "User Ext" category. Input: https://gitee.com/yeezb/k10web-cam in the search bar, and click to load the K10 webcam extension library.
User library link:https://gitee.com/yeezb/k10web-cam

We need to use the "Wi-Fi connect to account (SSID, Password)"command in the network communication extension library to configure Wi-Fi for the UNIHIKER K10 terminal. Please ensure that the Wi-Fi connected to UNIHIKER K10 is the same as that of your computer. At the same time, we also need to use the "Webcam On" function to enable the webcam feature, so that the information captured by the UNIHIKER K10 can be transmitted to the computer.

Once the network connection is successfully established, the "Webamera On" block can be used to transmit the information captured by UNIHIKER K10 to the computer.

Then, we need to determine the IP address of the UNIHIKER K10 in the serial monitor. Use a loop to execute the command five times to display the IP address five times. Then, drag the serial content output block, select string output and enable line wrapping. Finally, obtain the WiFi configuration, select the acquired IP address, and display it once every second.

Click Upload button,when the burning progress reaches 100%, it indicates that the program has been successfully uploaded.

Open the serial port monitor, and you can see the IP of UNIHIKER K10.
When you open IP/stream in your browser, you can view the camera screen. For example, in the IP shown in the above picture, you can enter 192.168.11.72/stream in your browser.

3.2 Task2: Vision Detection and Data Upload

Next, UNIHIKER K10 needs to connect to the MQTT platform,and we will create an SIoT topic to store the results, and use a computer to detect human keypoints.Estimate the current posture of the human body based on keypoints. The detection results will then be sent to SIoT for subsequent processing by the UNIHIKER K10.

(1) Prepare the Computer Environment

First, on our computer, we need to download the Windows version of SIoT_V2, extract it, and double-click start SloT.bat to start SIoT. After starting, a black window will pop up to initialize the server.Critical note: DO NOT close this window during operation, as it will terminate the SIoT service immediately.

Note: For details on downloading SIoT_V2, please refer to: https://drive.google.com/file/d/1qVhyUmvdmpD2AYl-2Cijl-2xeJgduJLL/view?usp=drive_link

After starting SIoT.bat on the computer, initialize the parameters for MQTT in UNIHIKER K10: set the IP address as the local computer's IP, the username as SIoT, and the password as dfrobot.

We need to install the required Pyrhon dependencies.Used to realize the recognition and processing of human keypoints information.Open a new Mind+ window,navigate to the Mode Switch section and select "Python Mode".

In Python Mode, click Code in the toolbar.Navigate to Library Management and the Library Installation page will open for dependency management.

Click "PIP Mode" and run the following commands in sequence to install six libraries including the mediapipe library and the ultralytics library.

CODE
pip insatll mediapipe
pip install ultralytics
numpy
request
opencv-python
opencv-contrib-python

(2) Write the Program

STEP One: Create Topics

Access "Your Computer IP Address:8080", such as"192.168.11.41:8080",in a web browser on your computer.

Enter the username 'SIoT' and password 'dfrobot' to log in to the SIoT IoT platform.

After logging into the SIoT platform, navigate to the Topic section and create a topic: 'abs' (Used for storing control instructions of the UNIHIKER K10 display board ) .Refer to the operations shown in the image below.

Next, we will write the code for human posture detection, which includes the main functional modules: human keypoints monitoring and angle calculation.

Follow these steps to create a new Python file named "visiondetect.py" in Mind+:In the 'Files in Project' directory of the right sidebar in Mind+, create a new py file named "visiondetect".

STEP Two: Human Posture Detection Code

We use the YOLOv8-pose model to detect human keypoints, and based on the body tilt Angle (with 15° and 45° as boundaries) and arm Angle (whether it is greater than 170°) in the image. Then, publish the instruction to the abs theme on the SIoT platform.

CODE
import cv2
import mediapipe as mp
import numpy as np
import time
import siot
import math
import requests  # Added HTTP request library

# Initialize MediaPipe Pose module for human pose detection
mp_pose = mp.solutions.pose
pose = mp_pose.Pose(static_image_mode=False,
                    model_complexity=1,
                    smooth_landmarks=True,
                    min_detection_confidence=0.5,
                    min_tracking_confidence=0.5)

# Initialize drawing utilities
mp_drawing = mp.solutions.drawing_utils

# Initialize Siot connection for IoT communication
# MQTT server: 127.0.0.1
siot.init(client_id="32867041742986824", server="127.0.0.1", port=1883, user="siot", password="dfrobot")
siot.connect()
siot.loop()

# Calculate the angle between two vectors (in degrees)
def calculate_angle(a, b, c):
    a = np.array(a)
    b = np.array(b)
    c = np.array(c)
    
    ba = a - b
    bc = c - b
    
    dot_product = np.dot(ba, bc)
    mag_ba = np.linalg.norm(ba)
    mag_bc = np.linalg.norm(bc)
    
    if mag_ba * mag_bc == 0:
        return 180
    
    angle_rad = np.arccos(dot_product / (mag_ba * mag_bc))
    angle_deg = np.degrees(angle_rad)
    
    return angle_deg

# Calculate body and arm angles
def calculate_body_angle(landmarks, frame_shape):
    h, w = frame_shape[:2]
    
    landmarks_list = landmarks.landmark
    
    # Shoulder positions
    left_shoulder = [landmarks_list[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x * w,
                     landmarks_list[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y * h]
    right_shoulder = [landmarks_list[mp_pose.PoseLandmark.RIGHT_SHOULDER.value].x * w,
                      landmarks_list[mp_pose.PoseLandmark.RIGHT_SHOULDER.value].y * h]
    
    # Hip positions
    left_hip = [landmarks_list[mp_pose.PoseLandmark.LEFT_HIP.value].x * w,
                landmarks_list[mp_pose.PoseLandmark.LEFT_HIP.value].y * h]
    right_hip = [landmarks_list[mp_pose.PoseLandmark.RIGHT_HIP.value].x * w,
                 landmarks_list[mp_pose.PoseLandmark.RIGHT_HIP.value].y * h]
    
    # Calculate midpoints of shoulders and hips
    shoulder_mid = [(left_shoulder[0] + right_shoulder[0]) / 2,
                    (left_shoulder[1] + right_shoulder[1]) / 2]
    hip_mid = [(left_hip[0] + right_hip[0]) / 2,
               (left_hip[1] + right_hip[1]) / 2]
    
    # Calculate reference point (vertical direction reference)
    vertical_ref = [hip_mid[0], hip_mid[1] - 100]  # 100 pixels upward
    
    # Calculate body tilt angle
    body_angle = calculate_angle(shoulder_mid, hip_mid, vertical_ref)
    
    # Calculate arm angles (shoulder-elbow-wrist)
    left_elbow = [landmarks_list[mp_pose.PoseLandmark.LEFT_ELBOW.value].x * w,
                  landmarks_list[mp_pose.PoseLandmark.LEFT_ELBOW.value].y * h]
    left_wrist = [landmarks_list[mp_pose.PoseLandmark.LEFT_WRIST.value].x * w,
                  landmarks_list[mp_pose.PoseLandmark.LEFT_WRIST.value].y * h]
    left_arm_angle = calculate_angle(left_shoulder, left_elbow, left_wrist)
    
    right_elbow = [landmarks_list[mp_pose.PoseLandmark.RIGHT_ELBOW.value].x * w,
                   landmarks_list[mp_pose.PoseLandmark.RIGHT_ELBOW.value].y * h]
    right_wrist = [landmarks_list[mp_pose.PoseLandmark.RIGHT_WRIST.value].x * w,
                   landmarks_list[mp_pose.PoseLandmark.RIGHT_WRIST.value].y * h]
    right_arm_angle = calculate_angle(right_shoulder, right_elbow, right_wrist)
    
    # Calculate average arm angle
    arm_angle = (left_arm_angle + right_arm_angle) / 2
    
    # Calculate arm spread (distance between wrists)
    arm_spread = np.linalg.norm(np.array(left_wrist) - np.array(right_wrist))
    
    return body_angle, arm_angle, arm_spread, shoulder_mid, hip_mid

# Track the last pose state
last_pose = None

# UniHIKER K10 video stream URL
url = 'http://192.168.11.72/stream'  

# Main processing loop (modified for HTTP stream)
def main():
    global last_pose
    
    img_data = b''
    
    try:
        response = requests.get(url, stream=True, timeout=10)
        print("[Video] Connected to UniHIKER video stream")
        
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                img_data += chunk

                # Find JPEG frame start and end markers
                start_idx = img_data.find(b'\xff\xd8')  # JPEG start
                end_idx = img_data.find(b'\xff\xd9')    # JPEG end

                if start_idx != -1 and end_idx != -1 and end_idx > start_idx:
                    # Extract complete JPEG frame
                    jpg_data = img_data[start_idx:end_idx+2]
                    
                    # Clear processed data
                    img_data = img_data[end_idx+2:] 
                    
                    # Convert to OpenCV image
                    img_np = np.frombuffer(jpg_data, dtype=np.uint8)
                    frame = cv2.imdecode(img_np, cv2.IMREAD_COLOR)
                    
                    if frame is None:
                        continue
                    
                    # Process video frame
                    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                    pose_results = pose.process(rgb_frame)
                    
                    if pose_results.pose_landmarks:
                        # Draw pose landmarks and connections
                        mp_drawing.draw_landmarks(
                            frame,
                            pose_results.pose_landmarks,
                            mp_pose.POSE_CONNECTIONS,
                            mp_drawing.DrawingSpec(color=(0, 255, 0), thickness=2),
                            mp_drawing.DrawingSpec(color=(255, 0, 0), thickness=2)
                        )
                        
                        # Calculate body angles (for pose recognition)
                        body_angle, arm_angle, arm_spread, shoulder_mid, hip_mid = calculate_body_angle(
                            pose_results.pose_landmarks, frame.shape)
                        
                        # Calculate shoulder distance for normalization
                        landmarks_list = pose_results.pose_landmarks.landmark
                        left_shoulder = [
                            landmarks_list[mp_pose.PoseLandmark.LEFT_SHOULDER.value].x * frame.shape[1],
                            landmarks_list[mp_pose.PoseLandmark.LEFT_SHOULDER.value].y * frame.shape[0]
                        ]
                        right_shoulder = [
                            landmarks_list[mp_pose.PoseLandmark.RIGHT_SHOULDER.value].x * frame.shape[1],
                            landmarks_list[mp_pose.PoseLandmark.RIGHT_SHOULDER.value].y * frame.shape[0]
                        ]
                        shoulder_dist = np.linalg.norm(np.array(left_shoulder) - np.array(right_shoulder))
                        
                        normalized_arm_spread = arm_spread / shoulder_dist if shoulder_dist > 0 else 0
                        
                        # Draw body center line
                        hip_mid_int = (int(hip_mid[0]), int(hip_mid[1]))
                        shoulder_mid_int = (int(shoulder_mid[0]), int(shoulder_mid[1]))
                        cv2.line(frame, hip_mid_int, shoulder_mid_int, (0, 0, 255), 3)
                        
                        # Determine current pose state
                        current_pose = None
                        
                        if body_angle <= 15 and arm_angle >= 170:
                            current_pose = "Stand Straight"
                            pose_text = "Pose: Stand Straight (b)"
                            text_color = (0, 255, 0)
                        
                        elif body_angle <= 15 and normalized_arm_spread > 1.8:
                            current_pose = "Open Body"
                            pose_text = "Pose: Open Body (a)"
                            text_color = (0, 165, 255)
                        
                        elif 15 < body_angle <= 45:
                            current_pose = "Moderate Bend"
                            pose_text = "Pose: Moderate Bend (c)"
                            text_color = (0, 255, 255)
                        
                        elif body_angle > 45:
                            current_pose = "Deep Bend"
                            pose_text = "Pose: Deep Bend (d)"
                            text_color = (0, 0, 255)
                        else:
                            pose_text = "Pose: Undefined"
                            text_color = (255, 255, 255)
                            
                        cv2.putText(frame, pose_text, (10, 30), 
                                    cv2.FONT_HERSHEY_SIMPLEX, 1, text_color, 2)
                        
                        # Send control commands only when pose changes
                        if current_pose != last_pose and current_pose is not None:
                            # Print detailed body angle information
                            print(f"[Body Info] Body Angle: {body_angle:.1f}°, "
                                  f"Arm Angle: {arm_angle:.1f}°, "
                                  f"Arm Spread: {arm_spread:.1f}px, "
                                  f"Arm Spread Ratio: {normalized_arm_spread:.2f}")
                            
                            if current_pose == "Stand Straight":
                                print("[Pose] Stand Straight detected")
                                print("[abs Command] Sending 'b'")
                                siot.publish(topic="siot/abs", data="b")
                            elif current_pose == "Open Body":
                                print("[Pose] Open Body detected")
                                print("[abs Command] Sending 'a'")
                                siot.publish(topic="siot/abs", data="a")
                            elif current_pose == "Moderate Bend":
                                print("[Pose] Moderate Bend detected")
                                print("[abs Command] Sending 'c'")
                                siot.publish(topic="siot/abs", data="c")
                            elif current_pose == "Deep Bend":
                                print("[Pose] Deep Bend detected")
                                print("[abs Command] Sending 'd'")
                                siot.publish(topic="siot/abs", data="d")
                            
                            last_pose = current_pose
                    
                    # Display system status
                    status = "Active" if pose_results.pose_landmarks else "Inactive"
                    cv2.putText(frame, f'Pose Tracking: {status}',
                                (10, frame.shape[0] - 20), 
                                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1)
                    
                    # Display processed image
                    cv2.imshow('Body Pose Recognition', frame)
                    
                    # Exit detection
                    if cv2.waitKey(5) & 0xFF == 27:
                        break
            
            # Exit detection
            if cv2.waitKey(5) & 0xFF == 27:
                break
                
    except requests.exceptions.RequestException as e:
        print(f"[Error] Video stream error: {e}")
    except KeyboardInterrupt:
        print("[Exit] Program terminated by user")
    finally:
        cv2.destroyAllWindows()

if __name__ == '__main__':
    main()

Since we need to obtain the camera screen from the UNIHIKER K10, we need to replace it with the IP address of your own UNIHIKER K10 in line 102 of the code.

Copy the code into the "visiondetect.py" file created in Mind+.(Note: The complete "visiondetect.py" file is provided as an attachment for reference)
Click Run to start the program.

When your body appears completely in the window, the program will detect your body position.
At this point, if different actions are made in front of the camera, the results of the recognized actions will be displayed in the video feed.

Monitor the terminal output for real-time status updates, as shown in the figure below.

3.3 Task 3:UNIHIKER K10 Receives Results and Executes Control

Next, we will implement the last function: the UNIHIKER K10 will display the oracle bone inscriptions picture corresponding to the human posture and its introduction based on the human posture information received from SIOT.

From left to right, they correspond to the actions and oracle bone inscriptions respectively: "large","stand","old","bend".

(1) Software Preparation

Make sure that Mind+ is opened and the UNIHIKER board has been successfully loaded. Once confirmed, you can proceed to write the complete project program.

(2)Write the Program
STEP One: Subscribe to SIoT Topics on UNIHIKER K10
Add an MQTT subscribe block and in the topic input field, enter "siot/abs".



STEP Two: Match the corresponding picture
Use the "When MQTT message received from topic_0" block to enable the smart terminal to process commands from siot/abs

Then, use the "Cache Local Images" and "Show Cached Content" modules to display the matching oracle bone script images on the screen of the UNIHIKER K10.
Use the "if-then" block and comparison operators to set the display of images based on the content of MQTT messages.If the MQTT message is equal to "a", an image of the oracle bone script for "large" will be displayed. If the MQTT message is equal to "b", an image of the oracle bone script of the character "stand" will be displayed. If the MQTT message is equal to "c", an image of the oracle bone script for "old" will be displayed. If the MQTT message is equal to "d", an image of the oracle bone script of the character "bend" will be displayed.


Select the pictures in the attachment folder, and then choose the pictures corresponding to "large", "stand", "old", and "bend" from top to bottom in sequence.


Below is the reference for the complete program.


Click Upload button,when the burning progress reaches 100%, it indicates that the program has been successfully uploaded.


STEP:
(1) Click the Run button to start visiondetect.py
(2) Point your body at the camera:
- UNIHIKER K10 will capture video, and the angle information will be displayed in the video feed
- Perform actions (such as standing or bending over) to match the pictographs

Displayed on UNIHIKER K10: "大 (dà, meaning 'large' or 'big')", "立 (lì, meaning 'stand' or 'erect')", "老 (lǎo, meaning 'old' or 'elderly')", "伏 (fú, meaning 'prostrate' or 'bend')".

4.Knowledge Hub

4.1 What is the Human Keypoints Model and What are Its Applications?

The human keypoints model is a model used in the field of computer vision to detect and locate specific key parts (keypoints) of the human body. The core is to extract human body structure information from pixel data to achieve quantitative analysis of human body movements and postures.By analyzing the positions and relationships of these keypoints, it enables the understanding of information such as human posture and movements.

These models serve as the foundation for:

- Abnormal behavior detection: By analyzing the dynamic changes of key points in human posture, it identifies dangerous behaviors such as falls and fights, and triggers an alarm.

- Service robot interaction: Robots understand instructions (such as "Come here" and "stop") by recognizing key information such as human gestures and standing postures.

- Character animation generation: By capturing the key point movement data of actors, quickly generate the pose frames of animated characters, reducing the cost of manual modeling (such as the motion capture technology in "Avatar").

4.2 What are the methods for human motion detection? What are their applications?

Human motion detection is a process of identifying and analyzing human postures, movement trajectories, and behavioral patterns through technical means. Based on technical principles and dependent devices, it can be divided into: (1) Computer vision - based detection (non - contact), (2) Sensor - based detection (contact / near - field), and (3) Hybrid detection (multimodal fusion).

In daily life, human motion detection models have a wide range of applications, and they are the basis for the following practical applications:

- Security and surveillance: Counting the movement trajectories of people (such as the walking routes of shopping mall customers) to optimize the layout or the deployment of police forces.

- Medical treatment and rehabilitation: Through electromyographic sensors or posture estimation, disabled people can control prosthetics and wheelchairs with movements (such as "nodding" to start a wheelchair and "making a fist" to grab objects).

- Industry and production safety: Monitoring whether workers operate in accordance with procedures (such as whether they "reach out in violation of regulations" when operating a robotic arm) to avoid safety accidents.

License
All Rights
Reserved
licensBg
0