+++ date = '2026-05-17T23:57:15-06:00' draft = false title = "Katchi, a dragon's best friend" summary = "Buidling a smart speaker with ESPHome for Home Assistant, and it just so happens to look like a kobold." tags = ['kobold', 'esp32'] +++ ## A smart-home for a Dragon {{< typeit tag=h3 speed=50 breakLines=false loop=true >}} "It's the Future..." "Dumb homes are so 2010" "Is all of this really necessary?" — Concerned Friends {{< /typeit >}} ### Smart-homes The present state of smart-home choices is fairly acceptable. You have your major players, **Google**, **Apple**, **Amazon** and their associated services like Google Home or **Alexa**. These systems are fairly easy to set up; plug in the new device, type in some credentials or type a prompt on your phone, and done. Most of these systems rely on a central hub that orchestrates the entire smart home. But all these systems have one fatal annoyance. They all require access to the internet. ### Internet dependency In recent years, it is common to run into issues with major providers. Privacy concerns, outages and the forced obsolescence of existing systems put a lot of pressure on me when building my first smart home. Sure the big players make it easy to set up and use, but for me the non-monetary cost was just too great. Besides the limitations in software, knowing that if I had an internet outage, or god forbid, the provider has an outage, I would be shit out of luck in turning off my lights turned me away from major providers. ### So what did I use? After spending a lot of time frustrated with my options and dealing with the difficulties in automating and doing what I wanted with my smart-home, I went down the rabbit hole of options and found **[Home Assistant](https://www.home-assistant.io/)**. ![Home Assistant](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fcommunity-assets.home-assistant.io%2Foriginal%2F4X%2F5%2F0%2Fe%2F50e585faea85010ebb16d3d466f071ef90ec1393.png&f=1&nofb=1&ipt=73955a250f2bf73ba578833607b6a377d67ea436a1562e35b202fb2273b3d35a) Unlike the big-name smart-homes, **Home Assistant** is a self-hosted option that runs on your own hardware and locally connects to supported devices. It supports a wide range of [devices and integrations](https://www.home-assistant.io/integrations/?brands=featured) and is fairly easy to set up. I wont expound on it much more here, but I will link to the [getting started](https://www.home-assistant.io/installation/), [documentation](https://www.home-assistant.io/docs/) and [community](https://community.home-assistant.io/) for more information. ### So what's the problem? Of all the amazing options that **Home Assistant** gives us, it has a fairly significant miss; that being Smart Speaker integration. ## Home Assistant Smart Speaker The options for **Home Assistant** smart speakers are quite limited, they only offer one official product as of the date of publishing this post. {{< externalLink url="https://www.home-assistant.io/voice-pe/" >}} While the **Home Assistant Voice PE** works decently, it is the only off-the-shelf option for **Home Assistant** which considering all the freedom **Home Assistant** gives us, feels quite limiting. However, there is a solution. ### The Solution Thankfully we are not constrained by the limitations of existing hardware thanks to microcontrollers, specifically the ESP family of microcontrollers. ![ESPHome](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fesphome.io%2Fimages%2Fog.webp&f=1&nofb=1&ipt=fd9bfb5ff1845d2803627ce6224161f86267d81a0f426d077cbae7deaeb75215) Using **[ESPHome](https://esphome.io)** you can create a whole myriad of smart devices based on the [ESP32 microcontroller](https://www.espressif.com/en/products/socs/esp32). It provides a very diverse family of options that can fit nearly any use-case. Think of it as an alternative to Arduino, where instead of writing C code you can write yaml configuration files that dictate and configure your ESP device. Knowing this, I set out to make my own Smart Speaker. ## Building a Katchi (smart speaker) So what does it take to make a smart speaker? You ultimately need a few key things such as a Speaker, a Microphone and a Wifi-enabled Microcontroller. For my purposes I decided I also wanted a screen so I could give my Katchi a little more personality. The main requirement I had for the display was for it to be circular as my intent was to use the display as the eye for my smart speaker. ### Waveshare ESP32-S3 I ended up landing on the [Waveshare ESP32-S3 1.75inch AMOLED Round Touch Display Development Board](https://www.waveshare.com/esp32-s3-touch-amoled-1.75.htm). Despite being quite a mouthful, this handy little device is absolutely packed with sensors and features, as well as a glorious AMOLED round display. {{< gallery >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-1.jpg" alt="Gallery image 1" figureClass="grid-w33" >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-2.jpg" alt="Gallery image 2" figureClass="grid-w33" >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-3.jpg" alt="Gallery image 3" figureClass="grid-w33" >}} {{< /gallery >}} Some of the fundamental things to take note of when considering esp devices, is what components and associated drivers are available in **ESPHome**. For the Waveshare device I picked, it has the following components and their support: | Device | Purpose | Supported | |---------|-----------------------------------------|--------------------------------------------------------| | TCA9554 | GPIO expander for additional interfaces | [Yes](https://esphome.io/components/pca9554/) | | ES7210 | ADC for microphones | [Yes](https://esphome.io/components/audio_adc/es7210/) | | ES8311 | DAC for speaker | [Yes](https://esphome.io/components/audio_dac/es8311/) | | CO5300 | Display controller for amoled | [Yes](https://esphome.io/components/display/mipi_spi/) | | CST9217 | Touchscreen control device | No | To fill in the support gap for the touch screen, we can make use of an [external driver](https://github.com/shelson/esphome-cst9217) to handle making the touchscreen work. ### Configuration Since **ESPHome** uses a yaml configuration file to define the device, configuring the device is fairly straightforward. #### Basic Configuration We first need to start with the basic confguration of the device. When setting up the esp32 configuration, it is crucial to be aware of the Flash size and CPU frequency otherwise your device may not run correctly. For the device I am using, it has a **16MB flash** and a **240MHz CPU**. We also use the esp-idf framework. This is the preferred framework for esp devices as the Arduino framework is not as feature-rich and is no longer supported by newer devices. ```yaml esphome: name: kobold friendly_name: Kobold esp32: board: esp32-s3-devkitc-1 flash_size: 16MB cpu_frequency: 240MHZ framework: type: esp-idf ``` We also need to set up **psram**, the **i2c bus** as well as the **SPI bus**. This will require a firm understanding of the GPIO pins and their associated functions. For the Waveshare device, they provide the following pinout diagram: [ESP32-S3-Touch-AMOLED-1.75.pdf](https://files.waveshare.com/wiki/ESP32-S3-Touch-AMOLED-1.75/ESP32-S3-Touch-AMOLED-1.75.pdf) We will rely on this document extensively for the rest of the configuration. The **psram** is important for making sure that the device does not run out of memory. and its config is quite simple. ```yaml psram: mode: octal speed: 80MHz ``` The **i2c bus** is used for the touchscreen and for other future components and acts as an important communication protocol for Microcontrollers as it allows a large amount of sensors and devices to connect to the same bus. ```yaml i2c: sda: GPIO15 scl: GPIO14 scan: true id: bus_a ``` The **SPI bus** is essential for the display module as it communicates via quad SPI which is functionally a quad channel serial bus. ```yaml spi: - id: spi_bus clk_pin: GPIO2 mosi_pin: GPIO1 miso_pin: number: GPIO3 ignore_strapping_warning: true - id: quad_spi_bus type: quad clk_pin: GPIO38 data_pins: - GPIO4 - GPIO5 - GPIO6 - GPIO7 ``` #### Display Configuration For our basic display configuration we will use the **mipi_spi** display driver. This driver specifically requires the **quad SPI bus** to be configured as well as the correct `data_rate`. You can play with the data rate for a **quad SPI display** as it will impact how the display refreshes and draws images. ```yaml display: - platform: mipi_spi id: disp1 model: CO5300 bus_mode: quad reset_pin: GPIO39 cs_pin: GPIO12 data_rate: 80MHz dimensions: height: 466 width: 466 offset_width: 6 ``` #### Audio Configuration Our audio configuration is quite a bit more complex. It requires we configure its own **SPI bus** as well as the **DAC** and **ADC** configs. Finally we then need to actually configure the audio components. Our **audio SPI bus** is more simple than our **quad SPI bus** > we ignore the strapping pin here to prevent warnings being thrown. Read more about this [here](https://esphome.io/guides/configuration-types/#pin-schema) ```yaml i2s_audio: - id: i2s_audio_bus i2s_mclk_pin: GPIO42 i2s_bclk_pin: GPIO9 i2s_lrclk_pin: number: GPIO45 ignore_strapping_warning: true ``` We then need to configure both our **DAC** and **ADC** drivers. For the ease of syncing our configs and not confusing changes in the future, we will first add substitutions. ```yaml substitutions: i2s_bps_spk: 16bit i2s_bps_mic: 16bit i2s_sample_rate_spk: 44100 i2s_sample_rate_mic: 16000 ``` We can then configure our **ADC** and **DAC** drivers and make use of these substitutions. ```yaml audio_adc: - platform: es7210 id: es7210_adc bits_per_sample: $i2s_bps_mic sample_rate: $i2s_sample_rate_mic audio_dac: - platform: es8311 id: es8311_dac bits_per_sample: $i2s_bps_spk sample_rate: $i2s_sample_rate_spk ``` Once we have our audio drivers configured, we can configure our **audio output** and **audio input** devices. We configure our audio devices using the same substitutions allowing us to change sample rates and bit depths without a possible mismatch between driver and device. ```yaml microphone: - platform: i2s_audio id: box_mic sample_rate: $i2s_sample_rate_mic i2s_din_pin: GPIO10 bits_per_sample: $i2s_bps_mic adc_type: external speaker: - platform: i2s_audio id: box_speaker i2s_dout_pin: GPIO8 dac_type: external sample_rate: $i2s_sample_rate_spk bits_per_sample: $i2s_bps_spk audio_dac: es8311_dac buffer_duration: 90ms use_apll: true ``` All together we end up with a long block of configuration that looks like this: ```yaml i2s_audio: - id: i2s_audio_bus i2s_mclk_pin: GPIO42 i2s_bclk_pin: GPIO9 i2s_lrclk_pin: number: GPIO45 ignore_strapping_warning: true audio_adc: - platform: es7210 id: es7210_adc bits_per_sample: $i2s_bps_mic sample_rate: $i2s_sample_rate_mic audio_dac: - platform: es8311 id: es8311_dac bits_per_sample: $i2s_bps_spk sample_rate: $i2s_sample_rate_spk microphone: - platform: i2s_audio id: box_mic sample_rate: $i2s_sample_rate_mic i2s_din_pin: GPIO10 bits_per_sample: $i2s_bps_mic adc_type: external speaker: - platform: i2s_audio id: box_speaker i2s_dout_pin: GPIO8 dac_type: external sample_rate: $i2s_sample_rate_spk bits_per_sample: $i2s_bps_spk audio_dac: es8311_dac buffer_duration: 90ms use_apll: true ``` #### Final Configuration There is a lot more config to go through, and I don't want to go over all of it in this blog, you can find all resources for the ESPHome portion of Katchi at my gitea repo. {{< gitea server="https://git.toomuchtaco.net" repo="taco/voice-assistant" >}} ### Designing a kobold