12 KiB
+++ date = '2026-05-17T23:57:15-06:00' draft = false title = "Katchi, a dragon's best friend" summary = "Buidling a smart speaker with ESPHome for Home Assistant, and it just so happens to look like a kobold." tags = ['kobold', 'esp32'] +++
A smart-home for a Dragon
{{< typeit tag=h3 speed=50 breakLines=false loop=true
}} "It's the Future..." "Dumb homes are so 2010" "Is all of this really necessary?" — Concerned Friends {{< /typeit >}}
Smart-homes
The present state of smart-home choices is fairly acceptable. You have your major players, Google, Apple, Amazon and their associated services like Google Home or Alexa. These systems are fairly easy to set up; plug in the new device, type in some credentials or type a prompt on your phone, and done. Most of these systems rely on a central hub that orchestrates the entire smart home.
But all these systems have one fatal annoyance. They all require access to the internet.
Internet dependency
In recent years, it is common to run into issues with major providers. Privacy concerns, outages and the forced obsolescence of existing systems put a lot of pressure on me when building my first smart home. Sure the big players make it easy to set up and use, but for me the non-monetary cost was just too great. Besides the limitations in software, knowing that if I had an internet outage, or god forbid, the provider has an outage, I would be shit out of luck in turning off my lights turned me away from major providers.
So what did I use?
After spending a lot of time frustrated with my options and dealing with the difficulties in automating and doing what
I wanted with my smart-home, I went down the rabbit hole of options and found
Home Assistant.

Unlike the big-name smart-homes, Home Assistant is a self-hosted option that runs on your own hardware and locally connects to supported devices. It supports a wide range of devices and integrations and is fairly easy to set up.
I wont expound on it much more here, but I will link to the getting started, documentation and community for more information.
So what's the problem?
Of all the amazing options that Home Assistant gives us, it has a fairly significant miss; that being Smart Speaker integration.
Home Assistant Smart Speaker
The options for Home Assistant smart speakers are quite limited, they only offer one official product as of the date of publishing this post.
{{< externalLink url="https://www.home-assistant.io/voice-pe/" >}}
While the Home Assistant Voice PE works decently, it is the only off-the-shelf option for Home Assistant which considering all the freedom Home Assistant gives us, feels quite limiting. However, there is a solution.
The Solution
Thankfully we are not constrained by the limitations of existing hardware thanks to microcontrollers, specifically the ESP family of microcontrollers.

Using ESPHome you can create a whole myriad of smart devices based on the ESP32 microcontroller. It provides a very diverse family of options that can fit nearly any use-case. Think of it as an alternative to Arduino, where instead of writing C code you can write yaml configuration files that dictate and configure your ESP device.
Knowing this, I set out to make my own Smart Speaker.
Building a Katchi (smart speaker)
So what does it take to make a smart speaker? You ultimately need a few key things such as a Speaker, a Microphone and a Wifi-enabled Microcontroller. For my purposes I decided I also wanted a screen so I could give my Katchi a little more personality. The main requirement I had for the display was for it to be circular as my intent was to use the display as the eye for my smart speaker.
Waveshare ESP32-S3
I ended up landing on the Waveshare ESP32-S3 1.75inch AMOLED Round Touch Display Development Board. Despite being quite a mouthful, this handy little device is absolutely packed with sensors and features, as well as a glorious AMOLED round display.
{{< gallery >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-1.jpg" alt="Gallery image 1" figureClass="grid-w33" >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-2.jpg" alt="Gallery image 2" figureClass="grid-w33" >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-3.jpg" alt="Gallery image 3" figureClass="grid-w33" >}} {{< /gallery >}}
Some of the fundamental things to take note of when considering esp devices, is what components and associated drivers are available in ESPHome. For the Waveshare device I picked, it has the following components and their support:
| Device | Purpose | Supported |
|---|---|---|
| TCA9554 | GPIO expander for additional interfaces | Yes |
| ES7210 | ADC for microphones | Yes |
| ES8311 | DAC for speaker | Yes |
| CO5300 | Display controller for amoled | Yes |
| CST9217 | Touchscreen control device | No |
To fill in the support gap for the touch screen, we can make use of an external driver to handle making the touchscreen work.
Configuration
Since ESPHome uses a yaml configuration file to define the device, configuring the device is fairly straightforward.
Basic Configuration
We first need to start with the basic confguration of the device. When setting up the esp32 configuration, it is crucial to be aware of the Flash size and CPU frequency otherwise your device may not run correctly. For the device I am using, it has a 16MB flash and a 240MHz CPU. We also use the esp-idf framework. This is the preferred framework for esp devices as the Arduino framework is not as feature-rich and is no longer supported by newer devices.
esphome:
name: kobold
friendly_name: Kobold
esp32:
board: esp32-s3-devkitc-1
flash_size: 16MB
cpu_frequency: 240MHZ
framework:
type: esp-idf
We also need to set up psram, the i2c bus as well as the SPI bus. This will require a firm understanding of the GPIO pins and their associated functions. For the Waveshare device, they provide the following pinout diagram: ESP32-S3-Touch-AMOLED-1.75.pdf We will rely on this document extensively for the rest of the configuration.
The psram is important for making sure that the device does not run out of memory. and its config is quite simple.
psram:
mode: octal
speed: 80MHz
The i2c bus is used for the touchscreen and for other future components and acts as an important communication protocol for Microcontrollers as it allows a large amount of sensors and devices to connect to the same bus.
i2c:
sda: GPIO15
scl: GPIO14
scan: true
id: bus_a
The SPI bus is essential for the display module as it communicates via quad SPI which is functionally a quad channel serial bus.
spi:
- id: spi_bus
clk_pin: GPIO2
mosi_pin: GPIO1
miso_pin:
number: GPIO3
ignore_strapping_warning: true
- id: quad_spi_bus
type: quad
clk_pin: GPIO38
data_pins:
- GPIO4
- GPIO5
- GPIO6
- GPIO7
Display Configuration
For our basic display configuration we will use the mipi_spi display driver. This driver
specifically requires the quad SPI bus to be configured as well as the correct data_rate.
You can play with the data rate for a quad SPI display as it will impact how the display refreshes and draws images.
display:
- platform: mipi_spi
id: disp1
model: CO5300
bus_mode: quad
reset_pin: GPIO39
cs_pin: GPIO12
data_rate: 80MHz
dimensions:
height: 466
width: 466
offset_width: 6
Audio Configuration
Our audio configuration is quite a bit more complex. It requires we configure its own SPI bus as well as the DAC and ADC configs. Finally we then need to actually configure the audio components.
Our audio SPI bus is more simple than our quad SPI bus
we ignore the strapping pin here to prevent warnings being thrown. Read more about this here
i2s_audio:
- id: i2s_audio_bus
i2s_mclk_pin: GPIO42
i2s_bclk_pin: GPIO9
i2s_lrclk_pin:
number: GPIO45
ignore_strapping_warning: true
We then need to configure both our DAC and ADC drivers. For the ease of syncing our configs and not confusing changes in the future, we will first add substitutions.
substitutions:
i2s_bps_spk: 16bit
i2s_bps_mic: 16bit
i2s_sample_rate_spk: 44100
i2s_sample_rate_mic: 16000
We can then configure our ADC and DAC drivers and make use of these substitutions.
audio_adc:
- platform: es7210
id: es7210_adc
bits_per_sample: $i2s_bps_mic
sample_rate: $i2s_sample_rate_mic
audio_dac:
- platform: es8311
id: es8311_dac
bits_per_sample: $i2s_bps_spk
sample_rate: $i2s_sample_rate_spk
Once we have our audio drivers configured, we can configure our audio output and audio input devices. We configure our audio devices using the same substitutions allowing us to change sample rates and bit depths without a possible mismatch between driver and device.
microphone:
- platform: i2s_audio
id: box_mic
sample_rate: $i2s_sample_rate_mic
i2s_din_pin: GPIO10
bits_per_sample: $i2s_bps_mic
adc_type: external
speaker:
- platform: i2s_audio
id: box_speaker
i2s_dout_pin: GPIO8
dac_type: external
sample_rate: $i2s_sample_rate_spk
bits_per_sample: $i2s_bps_spk
audio_dac: es8311_dac
buffer_duration: 90ms
use_apll: true
All together we end up with a long block of configuration that looks like this:
i2s_audio:
- id: i2s_audio_bus
i2s_mclk_pin: GPIO42
i2s_bclk_pin: GPIO9
i2s_lrclk_pin:
number: GPIO45
ignore_strapping_warning: true
audio_adc:
- platform: es7210
id: es7210_adc
bits_per_sample: $i2s_bps_mic
sample_rate: $i2s_sample_rate_mic
audio_dac:
- platform: es8311
id: es8311_dac
bits_per_sample: $i2s_bps_spk
sample_rate: $i2s_sample_rate_spk
microphone:
- platform: i2s_audio
id: box_mic
sample_rate: $i2s_sample_rate_mic
i2s_din_pin: GPIO10
bits_per_sample: $i2s_bps_mic
adc_type: external
speaker:
- platform: i2s_audio
id: box_speaker
i2s_dout_pin: GPIO8
dac_type: external
sample_rate: $i2s_sample_rate_spk
bits_per_sample: $i2s_bps_spk
audio_dac: es8311_dac
buffer_duration: 90ms
use_apll: true
Final Configuration
There is a lot more config to go through, and I don't want to go over all of it in this blog, you can find all resources for the ESPHome portion of Katchi at my gitea repo.
{{< gitea server="https://git.toomuchtaco.net" repo="taco/voice-assistant" >}}