Huskylens2 MCP with Python and Gemini LLM

RoniBandini Dec 01.2025

1 697 Medium

1. Firmware Upgrade

My HuskyLens 2 was an early maker release, which required a firmware upgrade to enable the latest features. The process was perfectly detailed and documented on the DFRobot website. I must pause here to offer my genuine thanks: the time and effort DFRobot invests in thorough documentation is invaluable. As makers working on multiple projects, nothing is more frustrating than losing time on setup and configuration simply because hardware companies neglect essential documentation.

You will need to download the following files:

Firmware image: huskylensV2-v1.1.6.1031.img.7zBurning tool: K230BurningTool.zipDriver installation tool: Zadig - Driver Installation Tool

All necessary steps and details are available here

Once firmware version 1.1.6 is successfully installed, navigate to the settings, connect the HuskyLens to your local Wi-Fi router, and ensure the MCP Server is enabled.

2. LLM and API Configuration

Next, you'll need a Google Gemini API key. You can obtain one easily at the Google AI Studio website: https://aistudio.google.com/app/api-keys. Google offers a generous free tier, and the paid options remain highly affordable for more intensive usage.

3. Client Setup

Finally, connect the camera to your Wi-Fi network and note the IP address assigned to the MCP Server. Open the Python client script (HuskyMCPChat.py) with a text editor and configure your Gemini API Key and the MCP Server IP Raddress within the script variables.

Run with $ python HuskyMCPChat.py

Usage and Interactivity

With the Python client running, you can use a menu and natural language commands to interact with the camera via the LLM

Change Algorithms: Switch algorithms (e.g., "switch to face recognition").Take Photos: Capture images, which are stored on the internal memory ("take a picture").Visual Query:Ask the LLM what the camera currently sees based on the active algorithm ("what do you see?").Combined Reasoning: Combine the camera's recognition data with an LLM prompt for queries such as: "Is there anything dangerous on the table?"

HuskyLens MCP Tools Overview

The following tools are exposed by the HuskyLens MCP Server and are callable by the LLM:

get_recognition_result

Obtains the real-time recognition result from HuskyLens, including image data and recognized labels (e.g., object type, person name). The primary operation is get_result. This is crucial for visual reasoning and generating natural-language descriptions of the camera's view.

manage_applications

Used to manage and query all internal applications (algorithms) of the HuskyLens. Supports operations like current_application, switch_application, and application_list.

multimedia_control

Provides control over the HuskyLens multimedia components, primarily the camera. The main operation is take_photo.

task_scheduler

Manages scheduled tasks. Call this tool when you need to create a timed or triggered action, such as: 'Take a picture when you see the keyboard' or 'Take a picture after 3 seconds'. Supports create and list operations. Tasks are defined by a trigger(optional, e.g., 'tiger'), a handler (required, currently only supports take_photo), and an optional timestamp for scheduled time.

Source code

https://github.com/ronibandini/HuskyLens2MCP

License

All Rights

Reserved

Tags HUSKYLENS AI LLM MCP

RoniBandini Apr 29.2022

1854 M-point

27 Makelogs