Intelligent Follow-up Fan Based on YOLO Facial Keypoint Detection and Hand Gesture Detection

auroraAA Nov 19.2025

0 25 Easy

1.Project Introduction

1.1 Project Overview

Craving a "smart fan" that understands your needs in the sweltering summer? We've created an intelligent follow-up fan using hardware like UNIHIKER K10, servo motors, and fan! By leveraging YOLO's Pose keypoint model for precise facial keypoint detection and combining with MediaPipe library for real-time hand gesture recognition (such as open palm and fist), this fan gains "smart vision" and "comprehension ability".

The computer communicates with UNIHIKER K10 via SIoT. Facing the camera, when you spread your palm, UNIHIKER K10 controls the fan to start; when you clench a fist, the fan stops. Additionally, UNIHIKER K10 works in tandem with a servo motor to make the fan automatically track facial movements, bringing full-scale technological appeal and practicality!

1.2 Project Functional Diagrams

1.3 Project Video

2 Materials List

2.1 Hardware List

HARDWARE LIST

1 UNIHIKER K10

Link

1 80° Clutch Servo

Link

1 Gravity: DC Fan Module

Link

1 USB-C Cable

Link

1 PH2.0-3P Male-to-Male Cable

Link

2.2 Software

Mind+ Graphical Programming Software (Minimum Version Requirement: V1.8.1 RC3.0)

2.3 Basic Mind+ Software Usage

(1) Double click to open the Mind+

The following screen will be called up.

Click and switch to offline mode.

(2) Load UNIHIKER K10

Based on the previous steps, then click on "Extensions" find the "UNIHIKER K10" module under the "Board" and click to add it. After clicking "Back" you can find the UNIHIKER K10 in the "Command Area" and complete the loading of UNIHIKER K10.

Then, you need to use a USB cable to connect the UNIHIKER K10 to the computer.

Then, after clicking Connect Device, click COM7-UNIHIKER K10 to connect.

Note: The device name of different UNIHIKER K10 may vary, but all end with K10.

In Windows 10/11, the UNIHIKER K10 is driver-free. However, for Windows 7, manual driver installation is required: https://www.unihiker.com/wiki/K10/faq/#high-frequency-problem.

The next interface you see is the Mind+ programming interface. Let's see what this interface consists of.

Note: For a detailed description of each area of the Mind+ interface, see the Knowledge Hub section of this lesson.

3. Construction Steps

Let's Build the Intelligent Tracking Fan!The project is divided into three main parts:

Task 1: UNIHIKER K10 MQTT Setup:UNIHIKER K10 connects to MQTT and subscribes to SIoT messages, establishing a communication channel for receiving control commands from the computer vision system.

Task 2: Computer Vision Detection and Data Upload:Use a computer camera for facial keypoint and gesture detection. The YOLO Pose model and MediaPipe library process real-time video feeds, then upload detection results to the SIoT platform.

Task 3: UNIHIKER K10 Control Loop:UNIHIKER K10 remotely retrieves inference results from SIoT to control the servo motor and fan. The microcontroller processes received data to adjust servo angles for facial tracking and toggle fan on/off based on gesture commands.

3.1 Task1: UNIHIKER K10 MQTT Setup

(1) Hardware Setup

Confirm that the UNIHIKER K10 is connected to the computer via a USB cable.

(2) Software Preparation

Make sure that Mind+ is opened and the UNIHIKER board has been successfully loaded. Once confirmed, you can proceed to write the project program.

(3）Write the Program

UNIHIKER K10 Network Connection

To enable communication between the computer and UNIHIKER K10, first ensure both devices are connected to the same local network.

Note: For more information about the MQTT protocol and IoT components, refer to the Knowledge Hub

Add MQTT communication and Wi-Fi modules from the extension library. Refer to the diagram for commands.

We need to set up Wi-Fi for the UNIHIKER K10 terminal using the "Wi-Fi connect to account (SSID, Password)"command. Ensure the Wi-Fi connected to the K10 is the same as your computer’s. Example commands are shown below:

On our computer, we need to download the Windows version of SIoT_V2, extract it, and double-click start SloT.bat to start SIoT. After starting, a black window will pop up to initialize the server.Critical note: Do NOT close this window during operation, as it will terminate the SIoT service immediately.

Note: For details on downloading SIoT_V2, please refer to: https://drive.google.com/file/d/1qVhyUmvdmpD2AYl-2Cijl-2xeJgduJLL/view?usp=drive_link

After starting SIoT.bat on the computer, initialize the parameters for MQTT in UNIHIKER K10: set the IP address as the local computer's IP, the username as SIoT, and the password as dfrobot.

After a successful MQTT connection, use the cache text and show cached content blocks to display "Connect MQTT Successfully" on the UNIHIKER K10's screen. This visual feedback confirms the establishment of the communication channel.Refer to the detailed commands shown in the image below.

When the program is running, the following effects will be observed, as shown in the picture below.This indicates that UNIHIKER K10 has successfully connected to the MQTT server.

3.2 Task2: Computer Vision Detection and Data Upload

Next, we'll implement facial keypoint detection and hand gesture recognition using the computer's webcam. The detection results will then be sent to SIoT for further processing by UNIHIKER K10.

(1) Prepare the Computer Environment

First , we need to install the required Python dependencies.Open a new Mind+ window，navigate to the Mode Switch section and select "Python Mode".

In Python Mode, click Code in the toolbar.Navigate to Library Management and the Library Installation page will open for dependency management.

Click PIP Mode in the Library Management interface and execute the following commands sequentially to install required libraries:

CODE

pip insatll mediapipe
pip install ultralytics

(2) Write the Program

STEP One: Create Topics

Access "Your Computer IP Address:8080", such as"192.168.9.27:8080",in a web browser on your computer.

Enter the username 'SIoT' and password 'dfrobot' to log in to the SIoT IoT platform.

After logging into the SIoT platform, navigate to the Topic section and create two topics:'value'(stores servo angle and other related tracking data) and 'fan' (stores fan control commands ,e.g., on/off ) .Refer to the operations shown in the image below.

Next, let's write the code for computer camera detection, which includes two main functions:(1)Head Tracking Control and (2)Gesture Recognition Control

Follow these steps to create a new Python file named "visiondetect.py" in Mind+:Click the New File button in Catalog and enter "visiondetect.py" in the filename input field.

STEP Two: Head Tracking Control Code

WE use the YOLOv8 pose estimation model to detect the nose position and calculate the servo control angle (45°–135°) based on the horizontal coordinate of the nose in the image.Then, publish the angle value to the value topic on the SIoT platform. The code of this part and annotations are as follows:

CODE

import cv2
import numpy as np
import time
import siot
from ultralytics import YOLO

# Initialize the YOLO Pose model for human pose estimation
model = YOLO('yolov8n-pose.pt')

# Set up the video capture device
cap = cv2.VideoCapture(0)
if not cap.isOpened():
    print("Error: Could not open video stream.")

# Local address: 192.168.9.27,
#  MQTT server: 127.0.0.1
siot.init(client_id="32867041742986824", server="127.0.0.1", port=1883, user="siot", password="dfrobot")
siot.connect()
siot.loop()

# Detect nose position using YOLO pose estimation
def get_head_position(frame):
    """
    Detect the position of the nose using YOLO pose estimation
    :param frame: Input image frame
    :return: Normalized x-coordinate of nose and (x,y) pixel coordinates
    """
    try:
        results = model(frame, verbose=False, device='cpu')
        if results and results[0].boxes.shape[0] > 0:
            person_box = results[0].boxes.xyxy[0]
            conf = results[0].boxes.conf[0] if len(results[0].boxes.conf) > 0 else 1.0
            if conf > 0.5:
                if results[0].keypoints.xy.shape[1] >= 17:
                    nose = results[0].keypoints.xy[0][0].cpu().numpy()
                    return nose[0] / frame.shape[1], (int(nose[0]), int(nose[1]))  # Return normalized position and pixel coords
    except Exception as e:
        print(f"Keypoint detection error: {str(e)}")
    return None, None  # Return None if detection fails

# Main processing loop
while cap.isOpened():
    success, frame = cap.read()
    if not success:
        break

    # Mirror the frame for natural interaction
    frame = cv2.flip(frame, 1)
    
    # Head tracking and nose position visualization
    nose_x, nose_coords = get_head_position(frame)
    if nose_coords:
        # Draw a green circle at the nose position
        cv2.circle(frame, nose_coords, 7, (0, 255, 0), -1)
        # Display nose coordinates
        cv2.putText(frame, f"Nose ({nose_coords[0]},{nose_coords[1]})",
                    (nose_coords[0] + 10, nose_coords[1]),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
        # Calculate servo angle based on nose position
        angle = 45 + (nose_x * 90)  # Map nose position to 45-135 degree range
        print(f"[YOLO Head Tracking] Current servo angle: {angle:.1f}°")
        siot.publish_save(topic="siot/value", data=angle)

    # Display system status
    cv2.putText(frame, f'Head Tracking: {"Active" if nose_coords else "Inactive"}',
                (10, frame.shape[0] - 20), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1)

    # Show the processed frame
    cv2.imshow('Head Position Control', frame)

    # Exit loop on ESC key press
    if cv2.waitKey(5) & 0xFF == 27:
        break

# Clean up resources
cap.release()
cv2.destroyAllWindows()

STEP Three: Gesture Recognition Control

We detect 21 hand keypoints using the MediaPipe hand tracking module and calculate finger joint angles to recognize three gestures: open palm, fist, and thumb-up.Then, publish corresponding fan control commands (ON/OFF) to the fan topic on the SIoT platform.The code of this part and annotations are as follows:

CODE

import cv2
import mediapipe as mp
import numpy as np
import time
import siot
import math

# Initialize the MediaPipe Hands module for hand tracking
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=False,
                       max_num_hands=1,
                       min_detection_confidence=0.6,
                       min_tracking_confidence=0.6)

# Initialize the MediaPipe drawing utilities
mp_drawing = mp.solutions.drawing_utils

# Set up the video capture device
cap = cv2.VideoCapture(0)
if not cap.isOpened():
    print("Error: Could not open video stream.")

# Initialize Siot connection for IoT communication
#  MQTT server: 127.0.0.1
siot.init(client_id="32867041742986824", server="127.0.0.1", port=1883, user="siot", password="dfrobot")
siot.connect()
siot.loop()

# Calculate the angle between two 2D vectors
def vector_2d_angle(v1, v2):
    """
    Calculate the angle between two 2D vectors
    :param v1: First vector [x, y]
    :param v2: Second vector [x, y]
    :return: Angle in degrees
    """
    v1_x = v1[0]
    v1_y = v1[1]
    v2_x = v2[0]
    v2_y = v2[1]
    try:
        angle_ = math.degrees(math.acos((v1_x * v2_x + v1_y * v2_y) / (((v1_x ** 2 + v1_y ** 2) ** 0.5) * ((v2_x ** 2 + v2_y ** 2) ** 0.5))))
    except:
        angle_ = 65535.  # Represents an invalid angle
    if angle_ > 180.:
        angle_ = 65535.
    return angle_

# Calculate hand angles for gesture recognition
def hand_angle(hand_):
    """
    Calculate angles between key finger segments to determine hand gestures
    :param hand_: List of hand landmark coordinates
    :return: List of angles for thumb, index, middle, ring, and little fingers
    """
    angle_list = []
    # Thumb angle
    angle_list.append(vector_2d_angle(
        ((int(hand_[0][0]) - int(hand_[2][0])), (int(hand_[0][1]) - int(hand_[2][1]))),
        ((int(hand_[3][0]) - int(hand_[4][0])), (int(hand_[3][1]) - int(hand_[4][1])))
    ))
    # Index finger angle
    angle_list.append(vector_2d_angle(
        ((int(hand_[0][0]) - int(hand_[6][0])), (int(hand_[0][1]) - int(hand_[6][1]))),
        ((int(hand_[7][0]) - int(hand_[8][0])), (int(hand_[7][1]) - int(hand_[8][1])))
    ))
    # Middle finger angle
    angle_list.append(vector_2d_angle(
        ((int(hand_[0][0]) - int(hand_[10][0])), (int(hand_[0][1]) - int(hand_[10][1]))),
        ((int(hand_[11][0]) - int(hand_[12][0])), (int(hand_[11][1]) - int(hand_[12][1])))
    ))
    # Ring finger angle
    angle_list.append(vector_2d_angle(
        ((int(hand_[0][0]) - int(hand_[14][0])), (int(hand_[0][1]) - int(hand_[14][1]))),
        ((int(hand_[15][0]) - int(hand_[16][0])), (int(hand_[15][1]) - int(hand_[16][1])))
    ))
    # Little finger angle
    angle_list.append(vector_2d_angle(
        ((int(hand_[0][0]) - int(hand_[18][0])), (int(hand_[0][1]) - int(hand_[18][1]))),
        ((int(hand_[19][0]) - int(hand_[20][0])), (int(hand_[19][1]) - int(hand_[20][1])))
    ))
    return angle_list

# Determine hand gesture based on calculated angles
def h_gesture(angle_list):
    """
    Classify hand gesture based on joint angles
    :param angle_list: List of angles for each finger
    :return: Gesture string label
    """
    thr_angle = 65.       # Threshold for most fingers
    thr_angle_thumb = 53. # Threshold for thumb
    thr_angle_s = 49.     # Threshold for open hand
    gesture_str = "Unknown"
    
    if 65535. not in angle_list:
        # Fist gesture: all fingers bent
        if (angle_list[0] > thr_angle_thumb) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and (
                angle_list[3] > thr_angle) and (angle_list[4] > thr_angle):
            gesture_str = "Fist"
        # Open hand gesture: all fingers straight
        elif (angle_list[0] < thr_angle_s) and (angle_list[1] < thr_angle_s) and (angle_list[2] < thr_angle_s) and (
                angle_list[3] < thr_angle_s) and (angle_list[4] < thr_angle_s):
            gesture_str = "Open Hand"
        # Thumb up gesture: only thumb straight
        elif (angle_list[0] < thr_angle_s) and (angle_list[1] > thr_angle) and (angle_list[2] > thr_angle) and (
                angle_list[3] > thr_angle) and (angle_list[4] > thr_angle):
            gesture_str = "Thumb Up"
    return gesture_str

# Track the last recognized gesture to avoid repeated actions
last_gesture = None

# Main processing loop
while cap.isOpened():
    success, frame = cap.read()
    if not success:
        break

    # Mirror the frame for natural interaction
    frame = cv2.flip(frame, 1)
    rgb_image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # Process hand landmarks
    hand_results = hands.process(rgb_image)
    if hand_results.multi_hand_landmarks:
        for hand_landmarks in hand_results.multi_hand_landmarks:
            # Draw hand landmarks and connections
            mp_drawing.draw_landmarks(
                frame,
                hand_landmarks,
                mp_hands.HAND_CONNECTIONS,
                mp_drawing.DrawingSpec(color=(0, 255, 0), thickness=2),
                mp_drawing.DrawingSpec(color=(255, 0, 0), thickness=2)
            )

            # Process hand gesture recognition
            hand_local = [(lm.x * frame.shape[1], lm.y * frame.shape[0]) for lm in hand_landmarks.landmark]
            if hand_local:
                angle_list = hand_angle(hand_local)
                gesture_str = h_gesture(angle_list)
                cv2.putText(frame, f"Gesture: {gesture_str}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

                # Send control commands based on gesture changes
                if gesture_str != last_gesture:
                    if gesture_str == "Open Hand":
                        print("[Gesture] Open hand detected")
                        print("[Control Command] Fan turned on")
                        siot.publish(topic="siot/fan", data="on")
                    elif gesture_str == "Fist":
                        print("[Gesture] Fist detected")
                        print("[Control Command] Fan turned off")
                        siot.publish(topic="siot/fan", data="off")
                    elif gesture_str == "Thumb Up":
                        print("[Control Command] Special action triggered")
                    last_gesture = gesture_str

    # Display system status
    cv2.putText(frame, f'Hand Tracking: {"Active" if hand_results.multi_hand_landmarks else "Inactive"}',
                (10, frame.shape[0] - 20), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1)

    # Show the processed frame
    cv2.imshow('Gesture Control', frame)

    # Exit loop on ESC key press
    if cv2.waitKey(5) & 0xFF == 27:
        break

# Clean up resources
cap.release()
cv2.destroyAllWindows()

Combine the head tracking code and gesture recognition code into a single script. And copy the merged code into the "visiondetect.py" file created in Mind+.(Note: The complete "visiondetect.py" file is provided as an attachment for reference)

Click Run to start the program.

When your face is fully visible in the window, the program will detect the position of your nose and display its coordinates.Make different hand gestures in front of the camera. The recognized gesture results will appear in the video feed.

Monitor the terminal output for real-time status updates, as shown in the figure below.

3.3 Task 3：UNIHIKER K10 Control Loop

Next, we'll implement the final component: controlling the servo motor and fan based on messages received from SIoT by UNIHIKER K10.

(1) Hardware Setup

To enable the fan to follow facial movements, mount the fan onto the servo motor using the structure shown below.

As is shown below, use a 3P cable to connect the hardware components to UNIHIKER K10:（1）Connect the fan to Port P0 （2）Connect the servo motor to Port P1.

(2) Software Preparation

Make sure that Mind+ is opened and the UNIHIKER board has been successfully loaded. Once confirmed, you can proceed to write the project program.

Click Extensions in the Mind+ toolbar and navigate to the Actuator category.Then locate the Micro Servo library in the list.Click tol oad the library.

(3)Write the Program

STEP One: Subscribe to SIoT Topics on UNIHIKER K10

Add an MQTT subscribe to block and in the topic input field, enter "siot/fan"; Add another MQTT subscribe to block.Enter"siot/value".

Then, use the cache local image and show cache content blocks to display the Smart Tracking Fan project's cover image on UNIHIKER K10's screen.

STEP Two: Control the fan

Use the "When MQTT message received from topic_0" block to enable the smart terminal to process commands from siot/fan

Use the if-then block and comparison operators to set GPIO pin P0 based on MQTT messages.If MQTT message equals "on" , add a set P0 to HIGH block to turn on the fan;If MQTT message equals "off" , add a set P0 to LOW block to turn off the fan.

STEP Three: Control the servo motor

Use the "When MQTT message received from topic_0" block again to enable servo angle control based on messages from "siot/value".Drag another "When MQTT message received from topic_0" block into the programming area and change the topic to "siot/value" in the block's configuration.

Use the set pin P0 servo and convert string to integer blocks to translate MQTT angle commands into servo movements.Add a convert string to integer block inside the MQTT message handler.Link the output of the conversion block (integer) to the angle input of a set pin P1 servo block.

Below is the reference for the complete program.

4.Upload the Program and Observe the Effect.

Click Upload button,When the burning progress reaches 100%, it indicates that the program has been successfully uploaded.

STEP:

（1）Click the Run button to start visiondetect.py

（2）Position your face in front of the camera:

- The servo should rotate to track your nose position, with angles displayed in the video feed.

- Make hand gestures (open palm/fist) to toggle the fan on/off.

5.Knowledge Hub

5.1 What is the Role of Face Keypoint Detection Models?

Face keypoint detection models identify and localize specific anatomical points on the human face (e.g., eyes, nose, mouth, jawline).

These models serve as the foundation for:

- Human-Computer Interaction (HCI): Enables hands-free control of devices via facial expressions or head movements (e.g., smart home systems, gaming interfaces).

- Biometrics & Security: Facilitates face recognition for access control, attendance systems, or payment authentication.

- Healthcare: Aids in diagnosing facial nerve disorders or tracking patient progress in rehabilitation therapies.