Speech and Vision with DFR1154 ESP32-S3 AI Camera

🧪 What if we could give senses to AI?

Ā 

For the past few years, I’ve been experimenting with ESP32 modules, plugging them into my Alambic framework. The goal? Build a seamless voice ā†”ļø LLM ā†”ļø voice loop — in any language — enriched with vision and a wide array of sensors. In short, give senses to artificial intelligence.

Ā 

šŸŽ‰ Good news: I finally found the right device — for under $25!

āœ… WebSocket communication with Node-RED
āœ… Powerful enough for light Edge AI
āœ… Compatible with dozens of sensors
āœ… And yes… it talks! šŸ—£ļø In any language!

Ā 

šŸŽ„ Real-time demo available (a bit slow for now, blame the MS API latency). With a real-time model, it’ll be much smoother (see the wiki).

Ā 

Ā 

I was really impress with ChatGPT4o be able to describe with accuracy the dragon where the image quality was very weak !

Ā 

āš™ļø The real challenge?
šŸ‘‰ Taming C++ in the real world: managing Wi-Fi power, avoiding CPU crashes, distributing tasks across cores… every detail matters.

šŸ”§ But there's still a major frustration: how does the Gravity connector actually work?
Ā 

Thanks to DFRobot for the DFR1154, a brilliant board But… I’m stuck:

Ā 

- Is the Gravity port UART-only, not I2C? If so, why such a limitation?- I tried connecting SEN0539 (for KWS) and DFR0997 (for camera), with no success.- There's no way to wire a button like DFR0785, which could trigger speech events.

Ā 

I posted a detailed question on the DFRobot forum, but still waiting for answers.

šŸ™ If you have any tip, workaround, or code snippet to make the Gravity port more useful — I’m all ears.

Ā 

Ā 

License
All Rights
Reserved
licensBg
0