OpenCV Feature Homography Target Tracking Using UNIHIKER

auroraAA Jun 05.2024

0 862 Easy

Introduction

This project is to learn OpenCV Feature Homography Target Tracking Using UNIHIKER. Connect an external USB camera on UNIHIKER and use the camera to detect the target and track it.

Project Objectives

Learn how to use the feature homography method of OpenCV library to detect the target and track it.

HARDWARE LIST

1 UNIHIKER - IoT Python Single Board Computer with Touchscreen

Link

1 Type-C&Micro 2-in-1 USB Cable

Link

1 USB camera

Link

Software

- Mind+ Programming Software

Practical Process

1. Hardware Setup

Connect the camera to the USB port of UNIHIKER.

Connect the UNIHIKER board to the computer via USB cable.

2. Software Development

Step 1: Open Mind+, and remotely connect to UNIHIKER.

Step 2: Find a folder named "AI" in the "Files in UNIHIKER". And create a folder named "OpenCV feature_homography target tracking based on UNIHIKER" in this folder. Import the dependency files for this lesson.

Step3: Create a new project file in the same directory as the above file and name it "main.py".

Sample Program:

CODE

#!/usr/bin/env python
 
'''
Feature homography
==================
 
Example of using features2d framework for interactive video homography matching.
ORB features and FLANN matcher are used. The actual tracking is implemented by
PlaneTracker class in plane_tracker.py
 
Inspired by http://www.youtube.com/watch?v=-ZNYoL8rzPY
 
video: http://www.youtube.com/watch?v=FirtmYcC0Vc
 
Usage
-----
feature_homography.py [<video source>]
 
Keys:
   SPACE  -  pause video
 
Select a textured planar object to track by drawing a box with a mouse.
'''
 
# Python 2/3 compatibility
from __future__ import print_function
 
import numpy as np
import cv2 as cv
 
import video
from video import presets
import common
from common import getsize, draw_keypoints
from plane_tracker import PlaneTracker
 
class App:
    def __init__(self, src):
        self.cap = video.create_capture(src, presets['book'])
        self.frame = None
        self.paused = False
        self.tracker = PlaneTracker()
 
        cv.namedWindow('plane',cv.WND_PROP_FULLSCREEN)
        cv.setWindowProperty('plane', cv.WND_PROP_FULLSCREEN, cv.WINDOW_FULLSCREEN)
 
        self.rect_sel = common.RectSelector('plane', self.on_rect)
 
    def on_rect(self, rect):
        self.tracker.clear()
        self.tracker.add_target(self.frame, rect)
 
    def run(self):
        while True:
            playing = not self.paused and not self.rect_sel.dragging
            if playing or self.frame is None:
                ret, frame = self.cap.read()
                if not ret:
                    break
                self.frame = frame.copy()
            w, h = getsize(self.frame)
            vis = np.zeros((h*2, w, 3), np.uint8)
            vis[:h,:w] = self.frame
 
            if len(self.tracker.targets) > 0:
                target = self.tracker.targets[0]
                vis[h:,:] = target.image
                draw_keypoints(vis[h:,:], target.keypoints)
                x0, y0, x1, y1 = target.rect
                cv.rectangle(vis, (x0, y0+h), (x1, y1+h), (0, 255, 0), 2)
 
            if playing:
                tracked = self.tracker.track(self.frame)
                if len(tracked) > 0:
                    tracked = tracked[0]
                    cv.polylines(vis, [np.int32(tracked.quad)], True, (255, 255, 255), 2)
                    for (x0, y0), (x1, y1) in zip(np.int32(tracked.p0), np.int32(tracked.p1)):
                        cv.line(vis, (x0, y0+h), (x1, y1), (0, 255, 0))
 
            draw_keypoints(vis, self.tracker.frame_points)

            self.rect_sel.draw(vis)
            cv.imshow('plane', vis)
 
            ch = cv.waitKey(1)
            if ch == ord(' '):
                self.paused = not self.paused
            if ch == 27:
                break
 
 
if __name__ == '__main__':
    print(__doc__)
    import sys
    try:
        video_src = sys.argv[1]
    except:
        video_src = 0
 
    App(video_src).run()

3. Run and Debug

Step 1: Run the main program

Run the program "main.py", you can see that the screen is initially divided into two areas, the upper part shows the real-time image captured by the camera, and the lower part initially shows a black background, and then point the camera screen at an object (here is a cart), and then use the mouse to frame out the object in the screen (cart), you can see that the cart is displayed as a marker in the lower part.The framed cart is displayed as a tagged object at the bottom. By gently moving the cart, we can see that the cart is always framed at the top of the screen and is connected to the bottom by the optical flow, which achieves the tracking purpose.

4. Program Analysis

In the above "main.py" file, we mainly use opencv library to call the camera to get the real-time video stream, and then we use ORB features and FLANN matcher in computer vision to realize the tracking of specific planes in the video.The user can select a region in the video with the mouse, and then the program will track the position and attitude of this region in the subsequent video frames. The overall process is as follows.

① Initialize video source and tracker: the program first opens the video source and creates a PlaneTracker object for tracking.

② User-selected tracking region: the user selects a region in the video with the mouse, and the selected region will be added to the PlaneTracker as a tracking target.

③ Tracking Processing: The program will read the video frame by frame and pass each frame to PlaneTracker for processing. PlaneTracker will use the ORB feature and FLANN matcher to find the feature that matches the target in the current frame and compute the target's position and attitude in the current frame.

④ Display results: the program will display the tracking results (including the position and attitude of the target) on the video in real time. If the tracking is successful, the target area will be highlighted.

⑤ User interaction: The user can pause or continue the video playback by pressing the space bar. If the user selects a new area while the video is paused, this area will replace the original tracked target.

Knowledge analysis

1. ORB Features (Oriented FAST and Rotated BRIEF)

ORB features, short for Oriented FAST and Rotated BRIEF, is an efficient feature detection and description algorithm proposed by Ethan Rublee et al. in the 2011 paper "ORB: An efficient alternative to SIFT or SURF."

The ORB feature algorithm mainly consists of two parts: feature point detection and feature description.

Feature Point Detection: ORB uses the FAST corner detector for feature point detection. The FAST corner detector is a highly efficient corner detection algorithm that can quickly find corners in an image. However, the corners detected by the FAST corner detector lack orientation information. To address this issue, the ORB algorithm introduces orientation information for the corners, resulting in oriented feature points.

Feature Description: ORB uses the rBRIEF descriptor for feature description. BRIEF is a binary feature descriptor that generates a binary string as a descriptor for a region by comparing the brightness of pixel pairs within a small area of the image. However, BRIEF descriptors are not rotation invariant, meaning they change when the image is rotated. ORB introduces rotation invariance to BRIEF, resulting in the rBRIEF descriptor.

The main advantages of the ORB feature algorithm are its fast computation speed, low memory usage, and suitability for real-time applications and embedded devices. Its performance is comparable to algorithms like SIFT and SURF, but it is faster and consumes less memory.

2. FLANN Matcher (Fast Library for Approximate Nearest Neighbors)

FLANN (Fast Library for Approximate Nearest Neighbors) is a library used for fast nearest neighbor searches in large-scale datasets. In many computer vision and machine learning tasks, we often need to find the nearest neighbor of a point in high-dimensional space, which is computationally expensive. The FLANN library provides a fast and efficient way to perform this operation.

FLANN supports various types of data and distance metrics, including Euclidean distance, Manhattan distance, Hamming distance, etc., and can automatically choose the most suitable algorithm for your data. It provides a data structure called an index, which preprocesses the data into a specific format to facilitate faster nearest neighbor searches during queries.

In computer vision, FLANN is often used for feature matching. For instance, if we extract some feature points from two images and need to find the corresponding feature points in both images, we can use FLANN to quickly find the nearest neighbor for each feature point, thereby achieving feature matching.

In OpenCV, the FLANN matcher can be used through the cv2.FlannBasedMatcher class. This class provides methods such as knnMatch and radiusMatch for k-nearest neighbor matching and radius matching.

3. Optical Flow

Optical flow is an important technique in computer vision used to estimate the motion of objects from video sequences. Optical flow describes the motion changes of pixels in the image sequence over time, i.e., the movement trajectory of each pixel on the image plane as time progresses.

The basic assumption of optical flow is that the brightness of the image remains constant over a short period, meaning that a pixel's brightness is the same in both the current frame and the next frame. Additionally, optical flow assumes that neighboring pixels have similar motion, i.e., the motion of objects is continuous.

Optical flow methods are typically used to estimate the speed or direction of motion of objects in a video. For example, in fields like autonomous driving, video surveillance, and motion detection, optical flow methods have widespread applications.

In terms of computation, optical flow methods can be categorized into sparse optical flow and dense optical flow. Sparse optical flow only computes the optical flow at specific points (e.g., feature points), while dense optical flow computes the optical flow at every pixel.

In OpenCV, functions like cv2.calcOpticalFlowPyrLK and cv2.calcOpticalFlowFarneback can be used to calculate optical flow. cv2.calcOpticalFlowPyrLK is a sparse optical flow method based on the Lucas-Kanade method, while cv2.calcOpticalFlowFarneback is a dense optical flow method.