Machine Medic AI-Driven Sound Signature Analysis for Industrial Machines
Machine Medic is an AI-powered solution that analyzes machine sound signatures to detect anomalies and predict potential failures, helping industries prevent costly downtime.
This is a submission for UNIHIKER IOT challenge
Under Category 3 - AI Gadget
Introduction
It all began during a visit to a food processing plant. As I walked through the facility, I was struck by the hum of machinery and conveyor belts moving products from one point to another. While talking to the operators, something unexpected stood out: the massive losses they face due to machine downtimes. A single fault in a conveyor belt, a gearbox, or a pump could bring an entire production line to a halt, resulting in costly delays. It made me think, what if there was a way to predict these issues before they happened?
The idea hit me right there, sparked by the very machines in front of me—particularly the conveyor belts, which play a crucial role in many industries. On average, thousands of machines rely on conveyor systems to move products efficiently. But if one of these machines fails, it could cost companies hours, even days, of lost productivity.
That’s when I decided to pursue a solution: predictive maintenance using sound. Machines, like humans, have their own “heartbeat” in the form of sound. Every machine has a distinct audio signature, and deviations from that sound could indicate a problem. By analyzing these sounds in real-time, we could identify potential issues and fix them before they cause significant downtime.
With that spark of inspiration, I set out to create Machine Medic, a cloud-based, AI-driven platform that listens to machines and detects faults early, helping industries avoid unnecessary downtime and stay ahead of costly repairs. Let’s dive into how this works, step by step.
Setting up the “MEDIC”
For this project, we're utilizing the UNIHIKER from DFRobot. While they offer comprehensive documentation to help you get started, I'll also provide some quick tips to guide you along the way. I used the Jupyter note book . To make best use of AI in building a sound model
Here is the link to the documentation: click here
After setting up the venv , It is time to collect data for processing and training the model
Data collection:
For the data collection process, I faced a unique challenge—there were no pre-existing datasets specifically focused on machine sounds, Most of my time went into data collection , particularly the distinct audio signals from their moving parts. To build a robust AI model, I had to gather sound samples myself. I recorded audio from the key moving components of machines in three different conditions: Good, Bad, and Old. These categories represent varying states of machine health, which allowed me to create a diverse and meaningful dataset for training the model.
one minute sound recording at each part of different category of machine which came up-to total of 5 hours of audio file
These are different parts of a moving machine
This is customizable to anyone building a project for a specific machine , needs to follow the same steps
Here are some recordings of a particular part of the chain
In-order to collect data , this code needs to be run in venv that is created using jupyter notebook
# -*- coding: utf-8 -*-
import os
import time
from unihiker import Audio, GUI
u_gui = GUI()
audio = Audio()
text = None
buttonA = None
num_files = 5
recording_duration = 60 # 60 seconds
def start_recording():
global text, buttonA
buttonA.config(state="disabled")
for i in range(1, num_files + 1):
if text:
text.config(text=f"Recording {i}/{num_files} started...\nPlease wait...")
else:
text = u_gui.draw_text(text=f"Recording {i}/{num_files} started...\nPlease wait...", x=25, y=150, font_size=15, color='#FFFFFF')
file_name = f'machine_sound_{i}.wav'
audio.start_record(file_name)
time.sleep(recording_duration)
audio.stop_record()
text.config(text=f"Recording {i}/{num_files} completed. File saved as {file_name}")
time.sleep(3)
buttonA.config(state="normal")
background_image = u_gui.draw_image(image="1.png", x=0, y=0)
buttonA = u_gui.add_button(text="Start Recording", x=25, y=100, w=190, h=40, onclick=start_recording)
title_text = u_gui.draw_text(text='Machine Medic - Data Collection', y=50, font_size=22, color='#FFFFFF')
while True:
time.sleep(0.1)
This would be the interface , that would run and collect sound data , by placing it at different parts of a machine
Next part: Training the AI model
Arrange the dataset in the root-folder
Here is how it should look.
Here is a schema for arrangement of the Audio files.
After arranging the dataset , here comes the “tough ” but “i will make it easy for you ” task
Training the model
This code implements an AI model to analyze machine sound data using MFCC (Mel-frequency cepstral coefficients) as features. It goes through the steps of loading the data, preprocessing it, building a neural network model, training it, and evaluating its performance.
The code given below is self explanatory as i have added many comments to it.
import os
import numpy as np
import librosa
import tensorflow as tf
import matplotlib.pyplot as plt
# Step 1: Load the Data
def load_data(base_path):
X = []
y = []
# Loop through each category
for label in os.listdir(base_path):
label_path = os.path.join(base_path, label)
if os.path.isdir(label_path):
# Loop through each audio file in the category
for file in os.listdir(label_path):
if file.endswith('.wav'):
file_path = os.path.join(label_path, file)
# Load the audio file
signal, sr = librosa.load(file_path, sr=None)
# Extract features (MFCC)
mfccs = librosa.feature.mfcc(y=signal, sr=sr, n_mfcc=40) # Corrected line
mfccs = np.mean(mfccs.T, axis=0)
X.append(mfccs)
y.append(label)
return np.array(X), np.array(y)
# Define the base paths where your training and testing data are stored
train_base_path = "/root/dataset/machinesounds/train" # Update this path
test_base_path = "/root/dataset/machinesounds/test" # Update this path
# Load the data
X_train, y_train = load_data(train_base_path)
X_test, y_test = load_data(test_base_path)
# Step 2: Preprocess the Data
# Encode the labels manually
unique_labels = np.unique(y_train) # Use y_train for encoding
label_to_index = {label: idx for idx, label in enumerate(unique_labels)}
y_train_encoded = np.array([label_to_index[label] for label in y_train])
y_test_encoded = np.array([label_to_index[label] for label in y_test]) # Encode test labels
# Reshape for the model
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
# Step 3: Build the Model
model = tf.keras.Sequential([
tf.keras.layers.Conv1D(16, kernel_size=2, activation='relu', input_shape=(X_train.shape[1], 1)),
tf.keras.layers.MaxPooling1D(pool_size=2),
tf.keras.layers.Conv1D(32, kernel_size=2, activation='relu'),
tf.keras.layers.MaxPooling1D(pool_size=2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(len(unique_labels), activation='softmax')
])
# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# Step 4: Train the Model
history = model.fit(X_train, y_train_encoded, epochs=50, validation_data=(X_test, y_test_encoded))
# Step 5: Evaluate the Model
# Predictions
y_pred = np.argmax(model.predict(X_test), axis=-1)
# Calculate accuracy
accuracy = np.sum(y_pred == y_test_encoded) / len(y_test_encoded)
print(f'Accuracy: {accuracy * 100:.2f}%')
# Plotting the training history
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Test Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
Data Loading
The function load_data(base_path) reads audio files from a directory structure, where each folder represents a label . It processes the sound files and extracts MFCC features:
MFCC (Mel-frequency cepstral coefficients): This is a feature used to represent the power spectrum of audio signals, capturing relevant sound characteristics.It loops through the directories, loads .wav files using Librosa, computes MFCCs for each file, and averages them to get a fixed-size feature vector.Preprocessing Technique
MFCC Extraction: The primary preprocessing step involves converting raw audio signals into MFCCs, which condense the key features of sound in a format that the model can process.Label Encoding: Labels are transformed from strings to numerical values to make them compatible with the model’s output format.Train-Test Split: Ensures the model is tested on unseen data, preventing overfitting.Reshaping: Reshaping the data to fit the input requirements of a convolutional neural network.Once your model is ready ,
Save the model in the same directory using this code snipper
model.save('my_trained_model.h5')
This model will be saved in the same directory
Once you are done , the rest of the project focus on building a GUI , that triggers the model and gives output for the input sound
The final run for setting up GUI and a backend to manage the Model
Using Jupyter notebook , in the root directory create a main file and run the following code.
Make sure the model name matches the one given in the code. The model should be present in the same directory
# -*- coding: utf-8 -*-
import os
import numpy as np
import librosa
import tensorflow as tf
from pinpong.extension.unihiker import *
from unihiker import Audio, GUI
import time
import threading # Import threading for non-blocking execution
# Load the pre-trained model
model_path = '/root/my_trained_model.h5' # Update this path to your model
loaded_model = tf.keras.models.load_model(model_path)
# Load unique labels for prediction
unique_labels = ["Good", "Old", "Bad"] # Update this with your actual labels
# Function to predict the machine state
def predict_audio(file_path):
signal, sr = librosa.load(file_path, sr=None)
mfccs = librosa.feature.mfcc(y=signal, sr=sr, n_mfcc=40)
mfccs = np.mean(mfccs.T, axis=0)
mfccs = mfccs.reshape(1, -1, 1) # Reshape for the model
prediction = np.argmax(loaded_model.predict(mfccs), axis=-1)
return unique_labels[prediction[0]]
# Instantiate a GUI object
u_gui = GUI()
# Instantiate audio
audio = Audio() # Create an instance of the audio class
# Variable to hold the message label
text = None
result_text = None # New variable to hold the result text
# Create a variable to hold the button
buttonA = None
# Callback function for the Start Recording button
def start_recording():
global text, buttonA # Reference global variables
# Disable the button to prevent multiple clicks
buttonA.config(state="disabled")
# Start the countdown in a separate thread
countdown_thread = threading.Thread(target=countdown, args=(3,))
countdown_thread.start() # Start the thread
def countdown(t):
global text
# Show countdown
for i in range(t, 0, -1):
text.config(text=str(i).upper(), font_size=100, x=50, color='#FFFFFF') # Update countdown text
time.sleep(1) # Delay for 1 second
# After countdown, update the text to indicate recording status
text.config(text="RECORDING STARTED...\nPLEASE WAIT...", font_size=15,x=10, color='#FFFFFF')
# Start recording
audio.start_record('machine_sound.wav') # Start recording
# Record for 15 seconds
time.sleep(15) # Recording duration
audio.stop_record() # Stop recording
text.config(text="RECORDING STOPPED.", font_size=15, x=10, color='#FFFFFF') # Update message
# Wait for 3 seconds
time.sleep(3) # Wait before processing audio
text.config(text="PLEASE WAIT.", font_size=15, x=10, color='#FFFFFF')
process_audio() # Process the audio
def process_audio():
predicted_state = predict_audio('machine_sound.wav') # Predict the machine state
global result_text
if result_text: # If result text already exists, update it
result_text.config(text=f"MACHINE STATE: {predicted_state.upper()}")
else: # Create result text if it doesn't exist
result_text = u_gui.draw_text(text=f"MACHINE STATE: {predicted_state.upper()}", x=10, y=200, font_size=15, color='#FFFFFF') # Center it
# Re-enable the button after the processing is complete
buttonA.config(state="normal")
# Load the background image
# Ensure '1.png' is present in the same directory as your script
background_image = u_gui.draw_image(image="1.png", x=0, y=0) # Set to cover the entire screen
# Create the Start Recording button and store it in the global variable
buttonA = u_gui.add_button(text="Start Recording", x=25, y=100, w=190, h=40, onclick=start_recording)
# Display the title of the application on top of the background
title_text = u_gui.draw_text(text='MACHINE MEDIC', y=50, font_size=20, color='#FFFFFF') # Adjust the y-position as needed
# Initialize message text for countdown and recording status
text = u_gui.draw_text(text="", x=10, y=150, font_size=15, color='#FFFFFF') # Placeholder for message updates
# Main loop to keep the GUI running
while True:
# Prevent the program from exiting or getting stuck
time.sleep(0.1)
As of now, the current model utilizes the built-in microphone from the UNIHIKER board for a smaller scale sound analysis. The model is designed to monitor conveyor chains that continuously operate. To expand this project for larger conveyor lines, we plan to set up multiple microphones along the length of the conveyor. These microphones will be connected to the UNIHIKER for real-time sound monitoring and analysis of different parts of the system.
The model has predicted that the machine is in bad condition based on the sound, yaaay
In the future, the goal is to integrate this solution into fully operational machines across various parts of the production line. By applying edge computing, we can process sound data locally, right at the machine site, reducing latency and improving real-time fault detection capabilities. This will not only help prevent machine downtime on a larger scale but also make the system scalable across different industries.