The AI Tour Guide

donutsorelse Dec 19.2024

1 1073 Easy

Things used in this project

Hardware components

Blues Notecarrier F Ã—1 Blues Notecard (Cellular)I used the global oneÃ—1 DFRobot UNIHIKER - IoT Python Programming Single Board Computer with Touchscreen Ã—1

Software apps and online services

Mind+

Inspiration

A once in a lifetime vacation was rapidly approaching. My wife and I tend to wander pretty fast from place to place, which is fantastic for seeing as much as possible, but means you don't glean a whole lot in terms of interesting landmark information. Sure, there are signs here and there and you can overhear a tour guide spiel every now and again, but what if you could just have a tour guide with you at all times? With a tour guide in my pocket, I could ask for all the info I could want at anytime and meander at our own pace. So I made one.

The baseline of the code further iterates upon some code I've been using for controlling my smart home. We'll still want to be able to use voice commands and text to speech, check to see if we have wifi and use Blues wireless if not, and so on. We use a Unihiker to run the program itself.

The short of what the program does is generates a tour based on your very specific coordinates. If you don't get accurate coordinates, your tour is going to be of some other location which would be incredibly useless. We use wifi if we have it and blues wireless when we don't. Realistically, most tours will occur when we don't have wifi, which means getting Blues up and running will be critical to our app. So, let's get that working!

The Blues wireless setup

I've grown quite comfortable with the Notecarrier-F, which is great because we'll still be using it. However, we're mixing up the notecard itself to accommodate international service. This switch is easy. There's a small screw on the Notecarrier that holds the notecard in place. Unscrew it, plug in the new notecard, reattach the connections, and you're good to go.

It's a new notecard, so we just need to jump over to the quickstart guide and get it up and running. As the hints in the terminal convey, you'll need to update the firmware, add wifi access, and associate the notecard with your productUID. Create your project and associated it accordingly.

What we are looking to do is setup a route that allows us to make an api call to generate an ai tour. To do this, just click Routes --> Create Route --> Proxy for Notecard Web Requests. We will be using open ai to generate our ai tours. Setting up a call like this is easier than it seems. It should look like the setup below, where obviously you put your own api key in within the authorization field.

The url is https://api.openai.com/v1/chat/completions

For http headers, as shown, we need to include additional headers and add an authorization header. For that field, write "Bearer" and paste in your api key from open ai.

You'll be able to see the calls going through on the Blues side as well. Below is a successful generation of a tour of Niagara Falls (you'll see in the full ai tour guide code that I have an option to use a dummy location - mostly so that I'm not sharing where I live with the world :) ).

As far as wifi setup goes for other cases, you can browse to http://10.1.2.3/pc/network-setting on the device connected to your UNIHIKER and connect it to the wifi from there.

Making the Tour Actually Enjoyable

An AI tour implies text to speech and the built in one I was using before was pretty rough on the ears. Since we're using the open ai api anyway, I used their text to speech api as well. It sounds night and day better. Passing a massive audio file is best done over wifi, though, so I ran quite a lot of tests to determine a good option to use when we aren't on wifi. Tacotron took absolutely forever, Festival sounded decent and ran fast but had some errors, and Pico also sounded decent and ran quickly. So, we use Pico when we aren't on wifi instead of what we were using previously. It obviously isn't as good as open ai's paid api text to speech, but it is way way easier on the ears than the previous one.

That brings us back to making sure the tour is actually good and relevant, which means getting good gps coordinates. When we're on wifi we use the Google Geolocation api. Otherwise we'd get incredibly unhelpful results if we just used the coordinates associated with the wifi. For when we don't have wifi, we'll obviously be using Blues wireless, but in these cases we can use calls Blues offers to get coordinates. The call I used, as you'll see in the code, was "card.time" because it is super straightforward and the response is concise and includes the user's coordinates.

Tour Time!

How it works is we say â€œButler, give me a tourâ€ (as well as some similar options), where Butler is the trigger word for processing commands. We tell open ai that it's a tour guide and request a tour in a format that would work well with text to speech. We receive a thorough tour as text and read it out with the text to speech setup I described earlier.

As convenient as it would be to bring my usb speaker so the whole family could hear the tour at once, the headphones felt a lot more reasonable. I don't want to be the guy blaring audio no one else asked for. There's no audio port on the UNIHIKER, but I had already split the usb port for the speaker tests so I was able to just find and use a usb headset just fine. We need the usb splitter because the Notecarrier-f is plugged in via usb as well.

While actually out and about using this thing I did come up with a much need improvement. In retrospect, it should have been obvious but here we are. Instead of exclusively using voice commands, it makes a whole lot more sense to be able to just press a button. It can be a bit awkward to blurt out â€œButler, give me a tourâ€! So, as you'll see in the code, we listen for a button press and start the tour if it is pressed.

Relevant to the code, it will just work right away once you put in your api keys and blues product uid, but you'll need to ensure the relevant libraries are installed.

The ones that can be installed via pip are easier, as there is a built in place for installation in Mind+, which is the custom IDE from DFRobot that directly supports the Unihiker. Those installs are below:

pip install speechrecognition

pip install pyttsx3

pip install simpleaudio

pip install pydub

pip install notecard

pip install requests

pip install pinpong

In particular, to get pico text to speech working (for when we don't have wifi), you'll need to install this via ssh. To do so, just go to a terminal and type ssh [email protected] [http://[email protected]/] while your UNIHIKER is connected to the computer. The password is dfrobot.

Then just run this and you should be good to go:

sudo apt update

sudo apt install -y python3 python3-pip python3-serial

sudo apt install -y alsa-utils

sudo apt install -y espeak-ng pico2wave aplay

Conclusion

Even after I got home I couldn't stop myself from further iterating on the code to make it nice, so it should be in a good spot to give you an on-the-go ai tour wherever your travels may take you! I added all the features I thought would be nice, but I'm always curious what spin-offs and enhancements others come up. Please share if you iterate on this project! Regardless, I hope you enjoyed - have a good one.

CODE

import os
import speech_recognition as sr
import time
import logging
import pyttsx3
from notecard import notecard
import serial
import requests
import simpleaudio as sa
from pydub import AudioSegment
from pinpong.board import Board
from pinpong.extension.unihiker import button_a, button_b

os.environ["PYTHONWARNINGS"] = "ignore"  # Suppress excess logging

# Configuration
NOTECARD_SERIAL_PORT = "/dev/ttyACM0"
NOTECARD_BAUD_RATE = 9600
PRODUCT_UID = "<your product uid here>"
BLUES_OPENAI_CHAT_ROUTE = "openai_chat"
BLUES_OPENAI_TTS_ROUTE = "openai_vision"
OPENAI_API_KEY = "<your key here>"
GOOGLE_API_KEY = "<your key here>"
force_blues = False  # Set to True to force Blues usage (simulate no Wi-Fi)

# Initialize board and logging
board = Board()
board.begin()
logging.basicConfig(level=logging.INFO)

engine = pyttsx3.init()
engine.setProperty("rate", 150)
engine.setProperty("volume", 1.0)

notecard_port = None


# Helper Functions
def setup_serial_connection(port, baud_rate):
    try:
        return serial.Serial(port, baud_rate)
    except Exception as e:
        logging.error(f"Failed to open serial port: {e}")
        return None


def setup_notecard(serial_port):
    try:
        card = notecard.OpenSerial(serial_port)
        req = {"req": "hub.set", "product": PRODUCT_UID, "mode": "continuous"}
        rsp = card.Transaction(req)
        logging.info(f"Notecard setup response: {rsp}")
        return card
    except Exception as e:
        logging.error(f"Failed to initialize Notecard: {e}")
        return None


def initialize_blues_service():
    global notecard_port
    serial_port = setup_serial_connection(NOTECARD_SERIAL_PORT, NOTECARD_BAUD_RATE)
    if serial_port:
        notecard_port = setup_notecard(serial_port)
    else:
        logging.error("No valid serial port found for Notecard.")


def get_coordinates():
    """Fetch coordinates via Google Geolocation API or Blues Notecard."""
    # return "43.0896, 79.0849" # Test coordinates are Niagara falls - I dont want to tell you guys where I live!
    global force_blues
    if not force_blues:
        try:
            logging.info("Attempting to retrieve coordinates via Google Geolocation API...")
            payload = {"considerIp": "true"}
            response = requests.post(
                f"https://www.googleapis.com/geolocation/v1/geolocate?key={GOOGLE_API_KEY}",
                json=payload,
                timeout=15
            )
            if response.status_code == 200:
                data = response.json()
                lat, lon = data['location']['lat'], data['location']['lng']
                logging.info(f"Coordinates retrieved via Google API: {lat},{lon}")
                return f"{lat},{lon}"
        except Exception as e:
            logging.warning(f"Google Geolocation API failed: {e}")

    if notecard_port:
        try:
            logging.info("Attempting to retrieve coordinates via Blues Notecard...")
            req = {"req": "card.time"}
            rsp = notecard_port.Transaction(req)
            if "lat" in rsp and "lon" in rsp:
                lat, lon = rsp["lat"], rsp["lon"]
                if lat != 0.0 or lon != 0.0:
                    logging.info(f"Coordinates retrieved via Blues Notecard: {lat},{lon}")
                    return f"{lat},{lon}"
        except Exception as e:
            logging.error(f"Failed to fetch coordinates from Blues Notecard: {e}")
    return "0.0,0.0"


def generate_tour(prompt):
    """Send a prompt to OpenAI API via Wi-Fi or Blues Wireless."""
    global force_blues
    if not force_blues:
        try:
            response = requests.post(
                "https://api.openai.com/v1/chat/completions",
                headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
                json={"model": "gpt-4", "messages": [{"role": "user", "content": prompt}], "max_tokens": 1000},
                timeout=15
            )
            if response.status_code == 200:
                # return response.json()["choices"][0]["message"]["content"]
                response = response.json()["choices"][0]["message"]["content"]
                logging.info(response)
                audio_path = openai_tourguide_speech(response)
                if not audio_path:
                    logging.warning("Failed to generate speech. Skipping audio playback.")
                    return
                play_audio(audio_path)
            else:
                logging.error(f"Wi-Fi OpenAI API Error: {response.status_code}, {response.text}")
        except Exception as e:
            logging.warning(f"Wi-Fi OpenAI call failed: {e}")

    if notecard_port:
        try:
            req = {
                "req": "web.post",
                "route": BLUES_OPENAI_CHAT_ROUTE,
                "body": {
                    "model": "gpt-4",
                    "messages": [{"role": "user", "content": prompt}],
                    "max_tokens": 1000
                }
            }
            logging.info("Sending request to OpenAI via Blues...")
            rsp = notecard_port.Transaction(req)
            if rsp.get("result") == 200:
                body = rsp["body"]
                response_text = body["choices"][0]["message"]["content"]
                logging.info(response_text)
                try:
                    logging.info("Using Pico TTS")
                    # Escape problematic characters in the text
                    sanitized_text = response_text.replace('"', '\\"').replace("'", "\\'")
                    os.system(f'pico2wave -w output_pico.wav "{sanitized_text}" && aplay output_pico.wav')
                    logging.info("Done playing tour audio")
                except Exception as e:
                    logging.error(f"Pico TTS call failed: {e}")
            else:
                logging.error(f"Blues OpenAI call failed: {rsp.get('body', {}).get('err', 'Unknown error')}")
        except Exception as e:
            logging.error(f"Blues OpenAI call failed: {e}")
    return None


    if not response:
        logging.warning("Failed to get response from OpenAI Chat API. Skipping TTS generation.")
        return
    logging.info(response)
    if not audio_path:
        logging.warning("Failed to generate speech. Skipping audio playback.")
        return

    play_audio(audio_path)


def openai_tourguide_speech(text):
    """Generate speech using OpenAI's TTS API."""
    try:
        response = requests.post(
            "https://api.openai.com/v1/audio/speech",
            headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
            json={"model": "tts-1", "voice": "alloy", "input": text},
            timeout=15
        )
        if response.status_code == 200:
            with open("speech.mp3", "wb") as f:
                f.write(response.content)
            return "speech.mp3"
        logging.error(f"OpenAI TTS API Error (Wi-Fi): {response.status_code}, {response.text}")
    except Exception as e:
        logging.warning(f"Wi-Fi TTS call failed: {e}")


def play_audio(file_path):
    """Play audio from the given file."""
    try:
        if file_path.endswith(".mp3"):
            audio = AudioSegment.from_file(file_path, format="mp3")
            normalized_audio = audio.apply_gain(-audio.max_dBFS)
            wav_file_path = file_path.replace(".mp3", ".wav")
            normalized_audio.export(wav_file_path, format="wav")
            file_path = wav_file_path

        wave_obj = sa.WaveObject.from_wave_file(file_path)
        play_obj = wave_obj.play()
        play_obj.wait_done()
    except Exception as e:
        logging.error(f"Failed to play audio: {e}")


def start_tour():
    """Handle the complete process of starting a tour."""
    coords = get_coordinates()
    if coords == "0.0,0.0":
        logging.warning("Failed to retrieve valid coordinates. Skipping tour generation.")
        return

    generate_tour(f"You are a tour guide.  Please give a thorough and interesting tour for the following specific coordinates.  Do not repeat the coordinates.  Simply give a guided tour of all the most interesting things about that location as if you are really there: {coords}")
    


def main():
    initialize_blues_service()
    recognizer = sr.Recognizer()
    microphone = sr.Microphone()
    button_a_pressed = False
    button_b_pressed = False

    try:
        while True:
            if button_a.is_pressed():
                if not button_a_pressed:
                    logging.info("Button A pressed: Starting tour...")
                    start_tour()
                    button_a_pressed = True
            else:
                button_a_pressed = False

            if button_b.is_pressed():
                if not button_b_pressed:
                    logging.info("Button B pressed: Starting another action...")
                    start_tour()
                    button_b_pressed = True
            else:
                button_b_pressed = False

            time.sleep(0.1)
    except KeyboardInterrupt:
        logging.info("Shutting down...")


if __name__ == "__main__":
    main()

License

All Rights

Reserved

Tags DFRobot UNIHIKER AI IoT Other topics

donutsorelse Nov 24.2022

418 M-point

15 Makelogs