Hey there, tech explorer! Ever wondered what it would be like to predict the price of a car using Python? Sounds like wizardry, right? Well, buckle up! In this tutorial, we're going to build a simple car valuation app using Pandas, scikit-learn Linear Regression, and CustomTkinter.
Important: The car data is fictional, so don't try to sell your rusty bike for a Ferrari price based on this.
Project objective:
By the end of this tutorial, you'll learn:
1. How to use Pandas to read data from CSV files (plus to manipulate and clean data).
2. How to use scikit-learn and to apply Linear Regression for car price predictions.
3. How to create a simple graphical user interface (GUI) using CustomTkinter.
Here's how your project folder should look after all folders and files are created:
$ tree VehicleValuation/
VehicleValuation/
|-- data
| `-- cars.csv.bz2
|-- lib
| `-- gui.py
`-- main.py
2 directories, 3 files
At the end of this guide you will find a ZIP archive with all the necessary folders and files.
Prepare the project:
Create a new folder named: VehicleValuation and inside this folder two more folders named: data and lib. Inside the project root folder create a file named: main.py and inside lib a file named: gui.py.
As already mentioned, you can find the CSV data set in the ZIP archive. Copy the file named: cars.csv.bz2 to the folder: data. You don't need to unpack cars.csv.bz2! Pandas can work with *.bz2 archives directly. This saves you some storage space on the Unihiker.
Once this is done, you can copy the following code examples into the respective Python files.
main.py
This is the brain of the application. Here's what it does:
1. Loads and preprocesses the CSV data using Pandas.
2. Splits the data for training and testing using scikit-learn (OneHotEncoder, ColumnTransformer and train_test_split).
3. Trains a Linear Regression model to predict car prices.
4. Connects the backend logic with the CustomTkinter GUI.
from os.path import dirname, abspath, exists, join
from sys import exit
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from gc import collect
from lib.gui import GUI
import pandas as pd
DISPLAY_WIDTH: int = 240
DISPLAY_HEIGHT: int = 320
DATA_FILE: str = "data/cars.csv.bz2"
def predict_price(l_model, l_cf, brand, mileage, horsepower, registration, fuel) -> float:
"""
Predict the price of a car based on various input features using a pre-trained regression
model and a transformation pipeline.
:param l_model: The regression model used to predict the price.
:type l_model: LinearRegression
:param l_cf: The transformation pipeline used to preprocess the input features.
:type l_cf: ColumnTransformer
:param brand: The brand of the car.
:type brand: str
:param mileage: The mileage of the car in kilometers.
:type mileage: int
:param horsepower: The horsepower of the car.
:type horsepower: int
:param registration: The registration year of the car.
:type registration: int
:param fuel: The type of fuel used by the car.
:type fuel: str
:return: The predicted price of the car as a float.
:rtype: float
"""
x_pred = pd.DataFrame([
[brand, mileage, horsepower, registration, fuel]
], columns=['Marke', 'Kilometerstand', 'PS', 'Erstzulassung', 'Kraftstoff'])
val = l_model.predict(l_cf.transform(x_pred))
return val
if __name__ == '__main__':
current_file_path = dirname(abspath(__file__))
data_path = join(current_file_path, DATA_FILE)
if not exists(data_path):
print(f'[ERROR] File: {data_path} not found.')
exit(1)
print(f'[INFO] Loading and prepare data from {data_path}...')
dtype_dict = {
'Marke': 'category',
'Kilometerstand': 'int32',
'PS': 'int16',
'Erstzulassung': 'int16',
'Kraftstoff': 'category',
'Preis (EUR)': 'float32'
}
df = pd.read_csv(data_path, dtype=dtype_dict, encoding='utf-8')
df.dropna(inplace=True)
car_brands = df['Marke'].unique()
df['Kraftstoff'] = df['Kraftstoff'].replace('Benzin', 'Petrol')
car_fuel = df['Kraftstoff'].unique()
mileage_limits = (df['Kilometerstand'].min(), df['Kilometerstand'].max())
horsepower_limits = (df['PS'].min(), df['PS'].max())
registration_limits = (df['Erstzulassung'].min(), df['Erstzulassung'].max())
x = df[
['Marke', 'Kilometerstand', 'PS', 'Erstzulassung', 'Kraftstoff']
]
cf = ColumnTransformer([
('Marke', OneHotEncoder(drop='first', sparse=True), ['Marke']),
('Kraftstoff', OneHotEncoder(drop='first', sparse=True), ['Kraftstoff'])
], remainder='passthrough')
cf.fit(x)
x_transformed = cf.transform(x)
y = df['Preis (EUR)']
x_train, x_test, y_train, y_test = train_test_split(x_transformed, y, train_size=0.75)
print('[INFO] Training model...')
model = LinearRegression()
model.fit(x_train, y_train)
print(f'[INFO] Model trained with score: {model.score(x_test, y_test)}')
del df, x, x_transformed, y, x_train, x_test, y_train, y_test
collect()
print('[INFO] Start application...')
app = GUI(width=DISPLAY_WIDTH,
height=DISPLAY_HEIGHT,
brand=car_brands,
fuel=car_fuel,
ps=horsepower_limits,
km=mileage_limits,
reg=registration_limits)
app.bind_calculate_button(
lambda brand, mileage, horsepower, registration, fuel: predict_price(
model, cf, brand, mileage, horsepower, registration, fuel
)
)
app.mainloop()
lib/gui.py
This file contains the user interface code. The GUI uses CustomTkinter for a modern look. Here are the highlights:
1. CTkSpinBox: Custom numeric input widget for mileage, horsepower, and registration year.
2. CTkOptionMenu: Dropdowns for car brand and fuel type.
3. Event Binding: The bind_calculate_button() method connects the UI button to the prediction logic.
from customtkinter import CTk, set_appearance_mode, CTkFrame, CTkButton, CTkLabel, CTkOptionMenu, CTkEntry
from typing import Callable
import numpy as np
class CTkSpinBox(CTkFrame):
def __init__(self, master, width, min_value, max_value, step_size=1):
"""
Class constructor for CTkSpinBox widget.
:param master: The parent widget where this widget will be placed.
:type master: Any
:param width: The total width of the numerical widget.
:type width: int
:param min_value: The minimum value that the numerical widget can hold.
:type min_value: int
:param max_value: The maximum value that the numerical widget can hold.
:type max_value: int
:param step_size: The step size by which the value is incremented or decremented when clicking the buttons.
:type step_size: int, optional
"""
super().__init__(master)
self.step_size = step_size
self.min_value = min_value
self.max_value = max_value
self.entry = CTkEntry(self, width=width - 60, justify="center")
self.entry.grid(row=0, column=0, columnspan=2)
self.entry.insert(0, str(min_value))
self.entry.bind("<Key>", lambda e: "break")
self.entry.bind("<Button-1>", lambda e: "break")
self.increment_button = CTkButton(self, text="+", width=30, command=self.increment)
self.increment_button.grid(row=0, column=2, padx=2)
self.decrement_button = CTkButton(self, text="-", width=30, command=self.decrement)
self.decrement_button.grid(row=0, column=3, padx=1)
def increment(self) -> None:
"""
Increments the current numeric value in the entry widget by a predefined step size.
:return: None
"""
value = int(self.entry.get())
if value + self.step_size <= self.max_value:
self.entry.delete(0, "end")
self.entry.insert(0, str(value + self.step_size))
def decrement(self) -> None:
"""
Decrements the current value in the entry field by a predefined step size.
:return: None
"""
value = int(self.entry.get())
if value - self.step_size >= self.min_value:
self.entry.delete(0, "end")
self.entry.insert(0, str(value - self.step_size))
def get(self) -> int:
"""
Retrieves the current value entered and returns it as an integer.
:return: The value retrieved from the entry, converted to an integer.
:rtype: int
"""
return int(self.entry.get())
class GUI(CTk):
_MARGIN: int = 5
_FONT: tuple = ("Arial", 18, "bold")
def __init__(self, width: int, height: int, brand: np.ndarray, fuel: np.ndarray, km: tuple, ps: tuple, reg: tuple):
"""
Class constructor for CTk application.
:param width: The width of the user interface in pixels.
:type width: int
:param height: The height of the user interface in pixels.
:type height: int
:param brand: An array of available vehicle brand options.
:type brand: numpy.ndarray
:param fuel: An array of available fuel type options.
:type fuel: numpy.ndarray
:param km: A tuple specifying the minimum and maximum vehicle mileage.
:type km: tuple
:param ps: A tuple specifying the minimum and maximum vehicle horsepower.
:type ps: tuple
:param reg: A tuple specifying the minimum and maximum vehicle registration years.
:type reg: tuple
"""
super().__init__()
brand_options = brand.tolist()
fuel_options = fuel.tolist()
min_km = km[0]
max_km = km[1]
default_km = 50000
min_ps = ps[0]
max_ps = ps[1]
default_ps = 120
min_year = reg[0]
max_year = reg[1]
default_year = (min_year + max_year) // 2
self.geometry(f"{width}x{height}+0+0")
self.resizable(width=False, height=False)
self.configure(padx=self._MARGIN, pady=self._MARGIN)
set_appearance_mode("dark")
self._headline = CTkLabel(self, text='Price Prediction', font=self._FONT)
self._headline.grid(row=0, column=0, columnspan=2, padx=self._MARGIN, pady=self._MARGIN)
self._brand = CTkLabel(self, text='Brand:')
self._brand.grid(row=1, column=0, sticky="e", padx=self._MARGIN)
self._om_brand = CTkOptionMenu(self, values=brand_options)
self._om_brand.set(str(brand_options[0]))
self._om_brand.grid(row=1, column=1, sticky="w", pady=self._MARGIN)
self._fuel = CTkLabel(self, text='Fuel:')
self._fuel.grid(row=2, column=0, sticky="e", padx=self._MARGIN)
self._om_fuel = CTkOptionMenu(self, values=fuel_options)
self._om_fuel.set(str(fuel_options[0]))
self._om_fuel.grid(row=2, column=1, sticky="w", pady=self._MARGIN)
self._mileage = CTkLabel(self, text='Kilometers:')
self._mileage.grid(row=3, column=0, sticky="e", padx=self._MARGIN)
self._e_mileage = CTkSpinBox(self, width=135, min_value=min_km, max_value=max_km, step_size=1)
self._e_mileage.grid(row=3, column=1, sticky="w", pady=self._MARGIN)
self._e_mileage.entry.delete(0, "end")
self._e_mileage.entry.insert(0, str(default_km))
self._horsepower = CTkLabel(self, text='Horsepower:')
self._horsepower.grid(row=4, column=0, sticky="e", padx=self._MARGIN)
self._e_horsepower = CTkSpinBox(self, width=135, min_value=min_ps, max_value=max_ps, step_size=1)
self._e_horsepower.grid(row=4, column=1, sticky="w", pady=self._MARGIN)
self._e_horsepower.entry.delete(0, "end")
self._e_horsepower.entry.insert(0, str(default_ps))
self._registration = CTkLabel(self, text='Registration:')
self._registration.grid(row=5, column=0, sticky="e", padx=self._MARGIN)
self._e_registration = CTkSpinBox(self, width=135, min_value=min_year, max_value=max_year, step_size=1)
self._e_registration.grid(row=5, column=1, sticky="w", pady=self._MARGIN)
self._e_registration.entry.delete(0, "end")
self._e_registration.entry.insert(0, str(default_year))
self._btn = CTkButton(self, text='Calculation')
self._btn.grid(row=6, column=0, columnspan=2, pady=self._MARGIN, padx=self._MARGIN)
self._price = CTkLabel(self, text='Select and predict', font=self._FONT)
self._price.grid(row=7, column=0, columnspan=2, padx=self._MARGIN, pady=self._MARGIN)
def _on_calculate(self, callback: Callable[[str, int, int, int, str], float]) -> None:
"""
Handles the calculation event, invoking the provided callback with required parameters
to compute the predicted price of a vehicle. Updates the price label with the result.
:param callback: A callable that computes the predicted price of the vehicle.
:type callback: Callable[[str, int, int, int, str], float]
:return: None
"""
brand = str(self._om_brand.get())
mileage = int(self._e_mileage.get())
horsepower = int(self._e_horsepower.get())
registration = int(self._e_registration.get())
fuel = str(self._om_fuel.get())
predicted_price = callback(brand, mileage, horsepower, registration, fuel)
if isinstance(predicted_price, np.ndarray):
predicted_price = predicted_price.item()
self._price.configure(text=f"{predicted_price:.2f} EUR")
def bind_calculate_button(self, callback: Callable[[str, int, int, int, str], float]) -> None:
"""
Binds a callback function to the calculate button. The callback function is triggered when
the button is pressed.
:param callback: A callable that accepts five parameters.
:type callback: Callable[[str, int, int, int, str], float]
:return: None
"""
self._btn.configure(command=lambda: self._on_calculate(callback))
Unihiker provides various options for uploading. For example, it is possible to upload the entire project to the Unihiker via SCP, SMB or FTP. The online documentation will be very helpful when making your selection!
Here is an example for SCP (user: root - password: dfrobot):
$ scp -r VehicleValuation/ [email protected]:/root/
Before you can use the application, you must make sure that all the required Python libraries and modules are installed on the Unihiker. This is the case with the exception of customtkinter.
Connect to Unihiker via SSH (user: root - password: dfrobot) and execute the following command:
# SSH connection
$ ssh [email protected]
# install customtkinter (via pip)
$ pip3 install customtkinter
# verify installation (optional)
$ pip3 freeze
You can start the application via Unihiker touch screen or command line. If the SSH connection to Unihiker still exists, run the following command in the terminal:
# run Python application
$ python3 /root/VehicleValuation/main.py
After starting you can use the application. Select the required values and press the calculate button on the touch screen.
1. Add Features: Include more car attributes for a better prediction model.
2. Improve the Model: Use advanced machine learning models, such as Random Forest or Gradient Boosting, for more accurate predictions.
3. Enhance the GUI: add more features.
4. Use real data: Find real data for the CSV file.