icon

Machine Medic AI-Driven Sound Signature Analysis for Industrial Machines

Machine Medic is an AI-powered solution that analyzes machine sound signatures to detect anomalies and predict potential failures, helping industries prevent costly downtime.

This is a submission for UNIHIKER IOT challenge
Under Category 3 - AI Gadget  
 

HARDWARE LIST
1 UNIHIKER - IoT Python Single Board Computer with Touchscreen

Introduction
It all began during a visit to a food processing plant. As I walked through the facility, I was struck by the hum of machinery and conveyor belts moving products from one point to another. While talking to the operators, something unexpected stood out: the massive losses they face due to machine downtimes. A single fault in a conveyor belt, a gearbox, or a pump could bring an entire production line to a halt, resulting in costly delays. It made me think, what if there was a way to predict these issues before they happened?

The idea hit me right there, sparked by the very machines in front of me—particularly the conveyor belts, which play a crucial role in many industries. On average, thousands of machines rely on conveyor systems to move products efficiently. But if one of these machines fails, it could cost companies hours, even days, of lost productivity.

That’s when I decided to pursue a solution: predictive maintenance using sound. Machines, like humans, have their own “heartbeat” in the form of sound. Every machine has a distinct audio signature, and deviations from that sound could indicate a problem. By analyzing these sounds in real-time, we could identify potential issues and fix them before they cause significant downtime.

With that spark of inspiration, I set out to create Machine Medic, a cloud-based, AI-driven platform that listens to machines and detects faults early, helping industries avoid unnecessary downtime and stay ahead of costly repairs. Let’s dive into how this works, step by step.


Setting up the “MEDIC”

For this project, we're utilizing the UNIHIKER from DFRobot. While they offer comprehensive documentation to help you get started, I'll also provide some quick tips to guide you along the way. I used the Jupyter note book . To make best use of AI in building a sound model 
Here is the link to the documentation: click here




 


 

After setting up the venv , It is time to collect data for processing and training the model

Data collection:
For the data collection process, I faced a unique challenge—there were no pre-existing datasets specifically focused on machine sounds, Most of my time went into data collection , particularly the distinct audio signals from their moving parts. To build a robust AI model, I had to gather sound samples myself. I recorded audio from the key moving components of machines in three different conditions: Good, Bad, and Old. These categories represent varying states of machine health, which allowed me to create a diverse and meaningful dataset for training the model.

one minute sound recording at each part of different category of machine which came up-to total of 5 hours of audio file

These are different parts of a moving machine


 

 

This is customizable to anyone building a project for a specific machine , needs to follow the same steps

Here are some recordings of a particular part of the chain 
 

In-order to collect data , this code needs to be run in venv that is created using jupyter notebook

CODE
# -*- coding: utf-8 -*-
import os
import time
from unihiker import Audio, GUI

u_gui = GUI()
audio = Audio()

text = None
buttonA = None

num_files = 5
recording_duration = 60  # 60 seconds

def start_recording():
    global text, buttonA
    buttonA.config(state="disabled")

    for i in range(1, num_files + 1):
        if text:
            text.config(text=f"Recording {i}/{num_files} started...\nPlease wait...")
        else:
            text = u_gui.draw_text(text=f"Recording {i}/{num_files} started...\nPlease wait...", x=25, y=150, font_size=15, color='#FFFFFF')

        file_name = f'machine_sound_{i}.wav'
        audio.start_record(file_name)
        time.sleep(recording_duration)
        audio.stop_record()

        text.config(text=f"Recording {i}/{num_files} completed. File saved as {file_name}")
        time.sleep(3)

    buttonA.config(state="normal")

background_image = u_gui.draw_image(image="1.png", x=0, y=0)

buttonA = u_gui.add_button(text="Start Recording", x=25, y=100, w=190, h=40, onclick=start_recording)
title_text = u_gui.draw_text(text='Machine Medic - Data Collection', y=50, font_size=22, color='#FFFFFF')

while True:
    time.sleep(0.1)

 
This would be the interface , that would run and collect sound data , by placing it at different parts of a machine


Next part: Training the AI model 

Arrange the dataset in the root-folder
Here is how it should look.



 



Here is a schema for arrangement of the Audio files.
 

 

After arranging the dataset , here comes the “tough ” but “i will make it easy for you ” task


Training the model

This code implements an AI model to analyze machine sound data using MFCC (Mel-frequency cepstral coefficients) as features. It goes through the steps of loading the data, preprocessing it, building a neural network model, training it, and evaluating its performance.

The code given below is self explanatory as i have added many comments to it.

CODE
import os
import numpy as np
import librosa
import tensorflow as tf
import matplotlib.pyplot as plt

# Step 1: Load the Data
def load_data(base_path):
    X = []
    y = []
    # Loop through each category
    for label in os.listdir(base_path):
        label_path = os.path.join(base_path, label)
        if os.path.isdir(label_path):
            # Loop through each audio file in the category
            for file in os.listdir(label_path):
                if file.endswith('.wav'):
                    file_path = os.path.join(label_path, file)
                    # Load the audio file
                    signal, sr = librosa.load(file_path, sr=None)
                    # Extract features (MFCC)
                    mfccs = librosa.feature.mfcc(y=signal, sr=sr, n_mfcc=40)  # Corrected line
                    mfccs = np.mean(mfccs.T, axis=0)
                    X.append(mfccs)
                    y.append(label)
    return np.array(X), np.array(y)

# Define the base paths where your training and testing data are stored
train_base_path = "/root/dataset/machinesounds/train"  # Update this path
test_base_path = "/root/dataset/machinesounds/test"     # Update this path

# Load the data
X_train, y_train = load_data(train_base_path)
X_test, y_test = load_data(test_base_path)

# Step 2: Preprocess the Data
# Encode the labels manually
unique_labels = np.unique(y_train)  # Use y_train for encoding
label_to_index = {label: idx for idx, label in enumerate(unique_labels)}
y_train_encoded = np.array([label_to_index[label] for label in y_train])
y_test_encoded = np.array([label_to_index[label] for label in y_test])  # Encode test labels

# Reshape for the model
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# Step 3: Build the Model
model = tf.keras.Sequential([
    tf.keras.layers.Conv1D(16, kernel_size=2, activation='relu', input_shape=(X_train.shape[1], 1)),
    tf.keras.layers.MaxPooling1D(pool_size=2),
    tf.keras.layers.Conv1D(32, kernel_size=2, activation='relu'),
    tf.keras.layers.MaxPooling1D(pool_size=2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(len(unique_labels), activation='softmax')
])

# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Step 4: Train the Model
history = model.fit(X_train, y_train_encoded, epochs=50, validation_data=(X_test, y_test_encoded))

# Step 5: Evaluate the Model
# Predictions
y_pred = np.argmax(model.predict(X_test), axis=-1)

# Calculate accuracy
accuracy = np.sum(y_pred == y_test_encoded) / len(y_test_encoded)
print(f'Accuracy: {accuracy * 100:.2f}%')

# Plotting the training history
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Test Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

Data Loading
 

The function load_data(base_path) reads audio files from a directory structure, where each folder represents a label . It processes the sound files and extracts MFCC features:

MFCC (Mel-frequency cepstral coefficients): This is a feature used to represent the power spectrum of audio signals, capturing relevant sound characteristics.It loops through the directories, loads .wav files using Librosa, computes MFCCs for each file, and averages them to get a fixed-size feature vector.

 

 

Preprocessing Technique

MFCC Extraction: The primary preprocessing step involves converting raw audio signals into MFCCs, which condense the key features of sound in a format that the model can process.Label Encoding: Labels are transformed from strings to numerical values to make them compatible with the model’s output format.Train-Test Split: Ensures the model is tested on unseen data, preventing overfitting.Reshaping: Reshaping the data to fit the input requirements of a convolutional neural network.

Once your model is ready ,
 



Save the model in the same directory using this code snipper

CODE
model.save('my_trained_model.h5')

This model will be saved in the same directory 
 


Once you are done , the rest of the project focus on building a GUI , that triggers the model and gives output for the input sound

The final run for setting up GUI and a backend to manage the Model

Using Jupyter notebook , in the root directory create a main file and run the following code.

Make sure the model name matches the one given in the code. The model should be present in the same directory
 

CODE
# -*- coding: utf-8 -*-
import os
import numpy as np
import librosa
import tensorflow as tf
from pinpong.extension.unihiker import *
from unihiker import Audio, GUI
import time
import threading  # Import threading for non-blocking execution

# Load the pre-trained model
model_path = '/root/my_trained_model.h5'  # Update this path to your model
loaded_model = tf.keras.models.load_model(model_path)

# Load unique labels for prediction
unique_labels = ["Good", "Old", "Bad"]  # Update this with your actual labels

# Function to predict the machine state
def predict_audio(file_path):
    signal, sr = librosa.load(file_path, sr=None)
    mfccs = librosa.feature.mfcc(y=signal, sr=sr, n_mfcc=40)
    mfccs = np.mean(mfccs.T, axis=0)
    mfccs = mfccs.reshape(1, -1, 1)  # Reshape for the model
    prediction = np.argmax(loaded_model.predict(mfccs), axis=-1)
    return unique_labels[prediction[0]]

# Instantiate a GUI object
u_gui = GUI()

# Instantiate audio
audio = Audio()  # Create an instance of the audio class

# Variable to hold the message label
text = None
result_text = None  # New variable to hold the result text

# Create a variable to hold the button
buttonA = None

# Callback function for the Start Recording button
def start_recording():
    global text, buttonA  # Reference global variables
    
    # Disable the button to prevent multiple clicks
    buttonA.config(state="disabled")  

    # Start the countdown in a separate thread
    countdown_thread = threading.Thread(target=countdown, args=(3,))
    countdown_thread.start()  # Start the thread

def countdown(t):
    global text
    # Show countdown
    for i in range(t, 0, -1):
        text.config(text=str(i).upper(), font_size=100, x=50, color='#FFFFFF')  # Update countdown text
        time.sleep(1)  # Delay for 1 second

    # After countdown, update the text to indicate recording status
    text.config(text="RECORDING STARTED...\nPLEASE WAIT...", font_size=15,x=10, color='#FFFFFF')
    
    # Start recording
    audio.start_record('machine_sound.wav')  # Start recording

    # Record for 15 seconds
    time.sleep(15)  # Recording duration
    audio.stop_record()  # Stop recording
    text.config(text="RECORDING STOPPED.", font_size=15, x=10, color='#FFFFFF')  # Update message

    # Wait for 3 seconds
    time.sleep(3)  # Wait before processing audio
    text.config(text="PLEASE WAIT.", font_size=15, x=10, color='#FFFFFF') 
    process_audio()  # Process the audio

def process_audio():
    predicted_state = predict_audio('machine_sound.wav')  # Predict the machine state
    
    global result_text
    if result_text:  # If result text already exists, update it
        result_text.config(text=f"MACHINE STATE: {predicted_state.upper()}")
    else:  # Create result text if it doesn't exist
        result_text = u_gui.draw_text(text=f"MACHINE STATE: {predicted_state.upper()}", x=10, y=200, font_size=15, color='#FFFFFF')  # Center it

    # Re-enable the button after the processing is complete
    buttonA.config(state="normal")  

# Load the background image
# Ensure '1.png' is present in the same directory as your script
background_image = u_gui.draw_image(image="1.png", x=0, y=0)  # Set to cover the entire screen

# Create the Start Recording button and store it in the global variable
buttonA = u_gui.add_button(text="Start Recording", x=25, y=100, w=190, h=40, onclick=start_recording)

# Display the title of the application on top of the background
title_text = u_gui.draw_text(text='MACHINE MEDIC', y=50, font_size=20, color='#FFFFFF')  # Adjust the y-position as needed

# Initialize message text for countdown and recording status
text = u_gui.draw_text(text="", x=10, y=150, font_size=15, color='#FFFFFF')  # Placeholder for message updates

# Main loop to keep the GUI running
while True:
    # Prevent the program from exiting or getting stuck
    time.sleep(0.1)

As of now, the current model utilizes the built-in microphone from the UNIHIKER board for a smaller scale sound analysis. The model is designed to monitor conveyor chains that continuously operate. To expand this project for larger conveyor lines, we plan to set up multiple microphones along the length of the conveyor. These microphones will be connected to the UNIHIKER for real-time sound monitoring and analysis of different parts of the system.

 

STEP 1
click on start recording
Click on start recording

 

STEP 2
Wait for the countdown and place it near the part of the machine to check

 

STEP 3
Once the recording stops after 15 seconds , The model starts predicting

 

The model has predicted that the machine is in bad condition based on the sound, yaaay

In the future, the goal is to integrate this solution into fully operational machines across various parts of the production line. By applying edge computing, we can process sound data locally, right at the machine site, reducing latency and improving real-time fault detection capabilities. This will not only help prevent machine downtime on a larger scale but also make the system scalable across different industries.

License
All Rights
Reserved
licensBg
3