Human Segmentation from Image ｜ Mediapipe Selfie Segmentation Model x UNIHIKER

auroraAA Apr 26.2024

0 1236 Easy

Introduction

This project is to connect an external USB camera on UNIHIKER and use the camera to detect the human and segment the human from the background.

Project Objectives

Learn how to use MediaPipe's Selfie Segmentation model for human segmentation.

HARDWARE LIST

1 UNIHIKER - IoT Python Single Board Computer with Touchscreen

Link

1 Type-C&Micro 2-in-1 USB Cable

Link

1 USB camera

Link

Software

- Mind+ Programming Software

Practical Process

1. Hardware Setup

Connect the camera to the USB port of Unihiker.

Connect the UniHiker board to the computer via USB cable.

2. Software Development

Step 1: Open Mind+, and remotely connect to Unihiker.

Step 2: Find a folder named "AI" in the "Files in UNIHIKER". And create a folder named "Human Segmentation from Image Based on Mediapipe and UNIHIKER" in this folder. Create a new project file and name it "main.py".

Sample Program:

main.py.zip 2KB Download(2)

CODE

import cv2    
import mediapipe as mp    
import numpy as np    
 
mp_drawing = mp.solutions.drawing_utils    
mp_selfie_segmentation = mp.solutions.selfie_segmentation    
 
# For webcam input:
BG_COLOR = (192, 192, 192) 
cap = cv2.VideoCapture(0)    
 
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 320)    
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)    
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)    
 
cv2.namedWindow('MediaPipe Selfie Segmentation', cv2.WND_PROP_FULLSCREEN)
cv2.setWindowProperty('MediaPipe Selfie Segmentation', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)
 
with mp_selfie_segmentation.SelfieSegmentation(model_selection=1) as selfie_segmentation:
    bg_image = None
    while cap.isOpened():
        success, image = cap.read()
        if not success:
            print("Ignoring empty camera frame.")
            continue
 
        image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
        image.flags.writeable = False
        results = selfie_segmentation.process(image)
 
        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
 
        condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1
        if bg_image is None:
            bg_image = np.zeros(image.shape, dtype=np.uint8)
            bg_image[:] = BG_COLOR
        output_image = np.where(condition, image, bg_image)
        output_image = cv2.rotate(output_image, cv2.ROTATE_90_COUNTERCLOCKWISE)
        cv2.imshow('MediaPipe Selfie Segmentation', output_image)
        if cv2.waitKey(5) & 0xFF == 27:
            break
 
cap.release()

3. Run and Debug

Step 1: Run the main program

Run the "main.py" program, you can see that the initial screen shows the real-time image captured by the camera. Aim the camera at the human. You can see that the human in the video is separated from the background. The huuman is kept in the original image, and the background is replaced with a gray color. This is the effect of image extraction.

4. Program Analysis

In the above "main.py" file, we mainly use OpenCV library to call the camera to get the real-time video stream, and then use MediaPipe Selfie Segmentation model to segment the human in each frame. The overall process is as follows.

1. Turn on the camera: the program first turns on the camera to get the real-time video stream.

2. Read each frame: In an infinite loop, the program reads each frame from the camera.

3. Image preprocessing: The captured image is first flipped (because images captured by the camera are usually mirrored), and then converted to RGB format (because images read by OpenCV are in BGR format by default, while most image processing and computer vision algorithms assume input images are in RGB format).

4. Apply the selfie segmentation model: the preprocessed image is input into the selfie segmentation model, which outputs a segmentation mask of the same size as the input image, and each pixel value in the segmentation mask indicates the probability that the corresponding image pixel is a foreground (i.e., a person).

5. Generate output image: generate an output image based on the segmentation mask and the original image. The program first creates a condition matrix, where each element of the matrix indicates whether the corresponding image pixel is foreground or not. Then, the program uses numpy's where function to generate an output image based on the condition matrix: if a pixel is foreground, the value of that pixel in the output image is the value of the corresponding pixel in the original image; if a pixel is background, the value of that pixel in the output image is the specified background color.

6. Display output image: display the generated output image.

7. Check user input: if the user presses the ESC key, exit the loop and end the program. The principle of the selfie segmentation model is to learn a mapping relation from an image to a segmentation mask using a deep learning methods. This model is usually a Convolutional Neural Network (CNN), and it requires a large number of images with segmentation annotations for training. During training, the network learns how to extract information about the person and background from the image by minimizing the difference between the predicted segmentation mask and the ground truth segmentation mask.

MediaPipe Selfie Segmentation Model

MediaPipe Selfie Segmentation model is a real-time portrait segmentation model specifically optimized for mobile devices. The model can distinguish between human and background in an image, allowing developers to achieve various interesting effects in mobile or web applications, such as changing the background and applying filters.

The Selfie Segmentation model is based on a U-Net network architecture optimized for mobile devices. U-Net is a very popular deep learning network structure. Its main feature is its U-shaped network structure, which includes a contracting and expanding process. It can extract the deep features of an image while maintaining the spatial information of the image, and is very suitable for image segmentation tasks.

The Selfie Segmentation model can run on real-time video streams and is very efficient. It can run smoothly even on some low-end mobile devices. The model takes an RGB image as input and produces an output segmentation mask of the same size as the input image. Each pixel value in the segmentation mask represents the probability that the corresponding pixel in the image belongs to the person.

MediaPipe provides complete model files and related APIs, making it very easy for developers to integrate the Selfie Segmentation model into their own applications.

Feel free to join our UNIHIKER Discord community! You can engage in more discussions and share your insights!

License

All Rights

Reserved