icon

Smart Voice Controlled Bluetooth Speaker Using ESP32

A DIY Bluetooth speaker using ESP32 with built-in voice recognition that lets you control music playback and volume using voice commands.

 

 

 

Story

 

Ever found yourself with messy hands while cooking, deep in a project, working out, or singing in the shower, wishing you could control your music without touching anything? That's exactly why I built this voice-controlled speaker.

 

While smart speakers like Amazon Echo, Google Home, and Apple HomePod have transformed how we interact with music, they all require internet connectivity and cloud processing to function - meaning no connection, no music control.

 

This project takes a different approach by creating a smart speaker that processes voice commands completely offline using DFRobot's Offline Language Learning Voice Recognition Sensor. The ESP32 microcontroller works double duty - handling Bluetooth audio streaming while managing voice commands, while the MAX98357A I2S amplifier ensures high-quality sound output.

 

What sets this project apart is its independence and simplicity. Once programmed, it works like any Bluetooth speaker but responds to natural voice commands like "play music, " "stop playing, " or "volume up" without needing apps or internet connection. The voice recognition happens instantly on-device, ensuring quick response times and complete privacy.

 

 

Hardware Required

 

 

1. ESP32 Development Board

2. DFRobot DF2301Q Voice Recognition Module 

 

  

 

3. DFRobot MAX98357A I2S Audio Amplifier 

 

 

 

 

4. Speaker (8 ohms recommended)

5. Power Supply (5V)

6. Connecting Wires

7. Project Box/Enclosure (optional)

 

 

 

 

 

Pin Connections

 

 

 

Voice Recognition Module (DF2301Q)

 

RX - GPIO16 (ESP32)

TX - GPIO17 (ESP32)

VCC - 5V

GND - GND

 

 

 

Audio Amplifier (MAX98357A)

 

 

BCLK - GPIO25

LRCLK - GPIO26

DIN - GPIO14

VCC - 5V

GND - GND

 

 

 

 

Software Dependencies

 

Make sure to install the two required Libraries below

1. DFRobot_DF2301Q Library for the voice recognition module

2.DFRobot_MAX98357A for the amplifier Module

It is is included in the code as shown below

 

#include <DFRobot_MAX98357A.h>
#include "DFRobot_DF2301Q.h"

 

 

 

 

How It Works

 

Voice Recognition Communication

 

The DF2301Q voice recognition module communicates with the ESP32 using UART protocol. While the module supports I2C communication, UART was chosen for its simplicity and straightforward implementation. The connection requires just two data pins (TX and RX) plus power and ground

 

Learn more about the module and how to use it here

 

// Configure voice recognition sensor on Serial2 for ESP32
DFRobot_DF2301Q_UART DF2301Q(/*hardSerial =*/&Serial2, /*rx =*/16, /*tx =*/17);

 

 

When the module recognizes a voice command, it sends a corresponding command ID (CMDID) through the serial connection. Each command has a unique ID that triggers specific actions:

 

 

// Voice command IDs 
const uint8_t CMD_PLAY = 92;
const uint8_t CMD_STOP = 93;
const uint8_t CMD_PREVIOUS = 94;
const uint8_t CMD_NEXT = 95;
const uint8_t CMD_REPEAT = 96;
const uint8_t CMD_VOLUME_UP = 97;
const uint8_t CMD_VOLUME_DOWN = 98;
const uint8_t CMD_VOLUME_MAX = 99;
const uint8_t CMD_VOLUME_MIN = 100;
const uint8_t CMD_VOLUME_MID = 101;

The main loop continuously monitors for command IDs:

void loop() {
  uint8_t commandID = DF2301Q.getCMDID();
  
  if (commandID != 0) {
    Serial.print("Received command ID: ");
    Serial.println(commandID);
    
    switch (commandID) {
      case CMD_VOLUME_UP:
        if (currentVolume < 9) {
          currentVolume++;
          amplifier.setVolume(currentVolume);
        }
        break;
        // Other cases...
    }
  }
}

 

 

 

Audio System

The MAX98357A amplifier connects to the ESP32 via I2S (Inter-IC Sound), a dedicated digital audio interface. This ensures high-quality audio transmission from Bluetooth to the speaker. The ESP32 handles Bluetooth A2DP (Advanced Audio Distribution Profile) for streaming audio from your devices.

 

 

 

Software Setup

 

Install Required Libraries

DFRobot_MAX98357ADFRobot_DF2301Q

 

Arduino IDE Settings

Board: ESP32 Dev ModuleUpload Speed: 115200Flash Frequency: 80MHzCPU Frequency: 240MHz

 

Upload the Code

Open the provided code in Arduino IDESelect the correct portUpload to your ESP32

 

 

 

Initial Configuration

 

The setup function initializes both the voice recognition module and amplifier:

 

void setup() {
  // Initialize voice recognition sensor
  while (!DF2301Q.begin()) {
    Serial.println("Voice sensor initialization failed!");
    delay(3000);
  }

  // Initialize amplifier
  while (!amplifier.begin("Nick Smart Speaker", GPIO_NUM_25, GPIO_NUM_26, GPIO_NUM_14)) {
    Serial.println("Amplifier initialization failed!");
    delay(3000);
  }

  // Configure voice module settings
  DF2301Q.settingCMD(DF2301Q_UART_MSG_CMD_SET_MUTE, 0);  // Unmute
  DF2301Q.settingCMD(DF2301Q_UART_MSG_CMD_SET_VOLUME, 10); // Set recognition volume
  DF2301Q.settingCMD(DF2301Q_UART_MSG_CMD_SET_WAKE_TIME, 10); // Wake time in seconds
}

 

 

 

Voice Commands

The system recognizes these commands:

 

 - "Play Music" - Start playback

 - "Stop" - Stop playback

 - "Next track" - Skip to next track

 - "Previous Track" - Go to previous track

 - "Volume Up" - Increase volume

 - "Volume Down" - Decrease volume

 - "Change Volume to Maximum " - Set volume to maximum

 - "Change Volume to Minimum " - Set volume to minimum

 - "Change Volume to Medium " - Set volume to middle level

 

 

 

Troubleshooting

 

 

 

Voice Recognition Issues

Ensure you're speaking clearly and within 1 meter of the deviceCheck if TX/RX pins are correctly connectedVerify Serial2 initialization in codeCheck serial monitor for command ID feedback

 

 

 

Audio Issues

Verify I2S pin connectionsCheck speaker connections and impedanceEnsure Bluetooth device is properly pairedMonitor serial output for initialization success

 

 

 

Connection Problems

Reset both ESP32 and Bluetooth deviceCheck power supply stabilityVerify all ground connectionsMonitor serial output for debugging informationA startup sound will play when successfully initialized

 

 

 

Operation Guide

Power on the deviceWait for the initialization confirmation

The device will appear as "Nick Smart Speaker" in your Bluetooth settings

 

Pair with your deviceUse voice commands to control playback and volume

 

 

 

Future Enhancements

Implement playlist controlAdd ability to play music from SD-CardAdd LED indicators for visual feedbackDevelop a mobile app for additional control

 

 

 

Credits

Special Thanks to DFRobot for providing the components used in this project.

 

 

 

Contribution and Collaboration

Want to help make this project even better? Join in! Whether you have ideas for new features, improvements, or just want to collaborate, your contributions are welcome. Feel free to fork the project, make changes, and submit them. Let us build something awesome together!

 

Github link https://github.com/tech-nickk/Smart-Voice-controlled-Bluetooth-Speaker

 

Don't forget to leave a like

 

Thankyou :)

 

Gallery


 

CODE
#include <DFRobot_MAX98357A.h>
#include "DFRobot_DF2301Q.h"
 
// Create amplifier instance
DFRobot_MAX98357A amplifier;
 
// Configure voice recognition sensor on Serial1
#if defined(ESP32)
  DFRobot_DF2301Q_UART DF2301Q(/*hardSerial =*/&Serial2, /*rx =*/16, /*tx =*/17);
#else
  DFRobot_DF2301Q_UART DF2301Q(/*hardSerial =*/&Serial1);
#endif
 
 
 
// Voice command IDs 
const uint8_t CMD_PLAY = 92;
const uint8_t CMD_STOP = 93;
const uint8_t CMD_PREVIOUS = 94;
const uint8_t CMD_NEXT = 95;
const uint8_t CMD_REOEAT = 96;
const uint8_t CMD_VOLUME_UP = 97;
const uint8_t CMD_VOLUME_DOWN = 98;
const uint8_t CMD_VOLUME_MAX = 99;
const uint8_t CMD_VOLUME_MIN = 100;
const uint8_t CMD_VOLUME_MID = 101;
 
 
// Current volume level
int currentVolume = 5;
 
void setup() {
  Serial.begin(115200);
 
  // Initialize voice recognition sensor
  while (!DF2301Q.begin()) {
    Serial.println("Voice sensor initialization failed!");
    delay(3000);
  }
  Serial.println("Voice sensor initialized successfully!");
 
  // Initialize amplifier
  while (!amplifier.begin("Nick Smart Speaker", GPIO_NUM_25, GPIO_NUM_26, GPIO_NUM_14)) {
    Serial.println("Amplifier initialization failed!");
    delay(3000);
  }
  Serial.println("Amplifier initialized successfully!");
 
  // Set initial volume
  amplifier.setVolume(currentVolume);
 
  // Initial voice module settings
  DF2301Q.settingCMD(DF2301Q_UART_MSG_CMD_SET_MUTE, 0);  // Unmute
  DF2301Q.settingCMD(DF2301Q_UART_MSG_CMD_SET_VOLUME, 10); // Set voice recognition volume
  DF2301Q.settingCMD(DF2301Q_UART_MSG_CMD_SET_WAKE_TIME, 10); // Wake time in seconds
  
  // Play startup sound
  DF2301Q.playByCMDID(23);  // You can change this ID to any appropriate sound
}
 
void loop() {
  // Get voice command ID
  uint8_t commandID = DF2301Q.getCMDID();
  
  // Process voice commands
  if (commandID != 0) {
    Serial.print("Received command ID: ");
    Serial.println(commandID);
    
    // Execute command based on ID
    switch (commandID) {
      case CMD_PLAY:
        Serial.println("Command: Play");
        esp_avrc_ct_send_passthrough_cmd(0, ESP_AVRC_PT_CMD_PLAY, ESP_AVRC_PT_CMD_STATE_PRESSED);
        break;
        
      case CMD_STOP:
        Serial.println("Command: Stop");
        esp_avrc_ct_send_passthrough_cmd(0, ESP_AVRC_PT_CMD_STOP, ESP_AVRC_PT_CMD_STATE_PRESSED);
        break;
        
      case CMD_NEXT:
        Serial.println("Command: Next Track");
        esp_avrc_ct_send_passthrough_cmd(0, ESP_AVRC_PT_CMD_FORWARD, ESP_AVRC_PT_CMD_STATE_PRESSED);
        break;
        
      case CMD_PREVIOUS:
        Serial.println("Command: Previous Track");
        esp_avrc_ct_send_passthrough_cmd(0, ESP_AVRC_PT_CMD_BACKWARD, ESP_AVRC_PT_CMD_STATE_PRESSED);
        break;
        
      case CMD_VOLUME_UP:
        if (currentVolume < 9) {
          currentVolume++;
          amplifier.setVolume(currentVolume);
          Serial.print("Volume increased to: ");
          Serial.println(currentVolume);
        }
        break;
        
      case CMD_VOLUME_DOWN:
        if (currentVolume > 0) {
          currentVolume--;
          amplifier.setVolume(currentVolume);
          Serial.print("Volume decreased to: ");
          Serial.println(currentVolume);
        }
        break;
 
      case CMD_VOLUME_MAX:
        if (currentVolume < 9) {
          currentVolume = 9;
          amplifier.setVolume(currentVolume);
          Serial.print("Volume increased to: ");
          Serial.println(currentVolume);
        }
        break;
 
      case CMD_VOLUME_MIN:
        
          currentVolume = 1;
          amplifier.setVolume(currentVolume);
          Serial.print("Volume increased to: ");
          Serial.println(currentVolume);
        
        break;
 
      case CMD_VOLUME_MID:
        
          currentVolume = 5;
          amplifier.setVolume(currentVolume);
          Serial.print("Volume increased to: ");
          Serial.println(currentVolume);
        
        break;
    }
  }
  
  delay(100);  // Small delay to prevent overwhelming the system
}
License
All Rights
Reserved
licensBg
2