Introduction
This project aims to implement Object Classification on UNIHIKER using ShuffleNetV2. This project interfaces a USB camera with UNIHIKER to identify objects through the camera, locate square objects in the frame, and highlight them with a bounding box.
Project Objectives
Learn how to decode QR Codes using OpenCV and the pyzbar Library
Software
Practical Process
1. Hardware Setup
Connect the camera to the USB port of UNIHIKER.
Connect the UNIHIKER board to the computer via USB cable.
2. Software Development
Step 1: Open Mind+, and remotely connect to UNIHIKER.
Step 2: Find a folder named "AI" in the "Files in UNIHIKER". And create a folder named "Object Classification Project Using ShuffleNetV2 Based on UNIHIKER " in this folder. Import the dependency files for this lesson.
Step 3: Create a new project file in the same directory as the above file and name it "main.py".
Sample Program:
import os
os.environ["NCNN_HOME"] = os.getcwd()
import sys
import cv2
import time
import numpy as np
import ncnn
from ncnn.model_zoo import get_model
from utils import print_topk
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 320)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
cv2.namedWindow('image',cv2.WND_PROP_FULLSCREEN) #Set the windows to be full screen.
cv2.setWindowProperty('image', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN) #Set the windows to be full screen.
net = get_model("shufflenetv2", num_threads=4, use_gpu=False)
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
# If loading a video, use 'break' instead of 'continue'.
continue
cls_scores = net(image)
image = print_topk(cls_scores, 3, image)
image = cv2.rotate(image, cv2.ROTATE_90_COUNTERCLOCKWISE)
cv2.imshow('image', image)
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
3. Run and Debug
Step1: Run the Main Program
Run the "main.py" program, you can see the initial screen shows the live image captured by the camera. Aiming the camera image at an object (e.g., mouse), you can see that the top three results with the highest probability are displayed on the screen. Among them, the most probable one is "mouse,computer mouse" i.e. mouse.
4. Program Analysis
In the above "main.py" file, we primarily use the OpenCV library to access the camera, read the image from the camera in real time, and then use the ShuffleNetV2 model to categorize the image and print out the top three most likely categories on the image. The overall process is as follows.
① Initialization: When the program starts, it sets the NCNN environment variables. Then, it opens the default camera and sets the camera's resolution and buffer size. Next, it creates a full-screen window named 'image' to display the images. Finally, it retrieves the ShuffleNetV2 model from the model library and sets the relevant parameters.
② Main loop: The program enters an infinite loop, where the following operations are performed in each iteration:
A frame is read from the camera. If the reading fails, the frame is ignored, and the loop continues to the next iteration.The ShuffleNetV2 model is used to classify the read frame, obtaining scores for each category.The top three most likely categories are printed on the read frame. The printing is done by adding text to the image with the name of the category and the corresponding score.
- Rotate the printed frame 90 degrees counterclockwise and display it in the window. The rotation is to make the display direction of the image consistent with the camera's shooting direction.
③User interaction: at the end of each loop, the program checks the user's keyboard input. If the user presses the 'ESC' key, then the program will exit the main loop.
④ End: when the main loop ends, the program will release the camera and exit. This is to release the resources occupied by the camera device so that it can be used by other programs.
5. Knowledge Corner - ShuffleNetV2 Model
ShuffleNet V2 is a lightweight deep neural network architecture designed to run on devices with limited computational and memory resources, such as smartphones or embedded devices. It was proposed by Facebook's research team in 2018 and achieved excellent performance on the ImageNet image classification task.
The main feature of ShuffleNet V2 is the introduction of two new operations: Channel Shuffle and Pointwise Group Convolution. These operations effectively reduce the computational load and the number of parameters in the model while maintaining good performance.
1. Channel Shuffle: This operation rearranges the order of the channels in the input feature map. In this way, it increases information exchange between different channels, thereby enhancing the model's representation capabilities.
2. Pointwise Group Convolution: This is a special convolution operation that divides the input feature map's channels into several groups and then performs convolution within each group. This reduces the model's computational load and the number of parameters while maintaining good performance.
Another characteristic of ShuffleNet V2 is the design principles of its architecture, including equal channel number convolution, balanced width and output channel number, and gradually increasing output channels. These design choices aim to balance the model's computational load, parameter count, and performance.
In summary, ShuffleNet V2 is an efficient, lightweight deep neural network architecture suitable for image classification and other computer vision tasks on resource-constrained devices.
Feel free to join our UNIHIKER Discord community! You can engage in more discussions and share your insights!