Files
taco-blog/content/posts/katchi/index.md
taco ca7c2b2248
All checks were successful
Build and Publish Docker Image / build (push) Successful in 3m44s
Make some changes, update summaries
2026-05-28 07:11:33 -06:00

12 KiB

+++ date = '2026-05-17T23:57:15-06:00' draft = false title = "Katchi, a dragon's best friend" summary = "Buidling a smart speaker with ESPHome for Home Assistant, and it just so happens to look like a kobold." tags = ['kobold', 'esp32'] +++

A smart-home for a Dragon

{{< typeit tag=h3 speed=50 breakLines=false loop=true

}} "It's the Future..." "Dumb homes are so 2010" "Is all of this really necessary?" — Concerned Friends {{< /typeit >}}

Smart-homes

The present state of smart-home choices is fairly acceptable. You have your major players, Google, Apple, Amazon and their associated services like Google Home or Alexa. These systems are fairly easy to set up; plug in the new device, type in some credentials or type a prompt on your phone, and done. Most of these systems rely on a central hub that orchestrates the entire smart home.

But all these systems have one fatal annoyance. They all require access to the internet.

Internet dependency

In recent years, it is common to run into issues with major providers. Privacy concerns, outages and the forced obsolescence of existing systems put a lot of pressure on me when building my first smart home. Sure the big players make it easy to set up and use, but for me the non-monetary cost was just too great. Besides the limitations in software, knowing that if I had an internet outage, or god forbid, the provider has an outage, I would be shit out of luck in turning off my lights turned me away from major providers.

So what did I use?

After spending a lot of time frustrated with my options and dealing with the difficulties in automating and doing what I wanted with my smart-home, I went down the rabbit hole of options and found Home Assistant. Home Assistant

Unlike the big-name smart-homes, Home Assistant is a self-hosted option that runs on your own hardware and locally connects to supported devices. It supports a wide range of devices and integrations and is fairly easy to set up.

I wont expound on it much more here, but I will link to the getting started, documentation and community for more information.

So what's the problem?

Of all the amazing options that Home Assistant gives us, it has a fairly significant miss; that being Smart Speaker integration.

Home Assistant Smart Speaker

The options for Home Assistant smart speakers are quite limited, they only offer one official product as of the date of publishing this post.

{{< externalLink url="https://www.home-assistant.io/voice-pe/" >}}

While the Home Assistant Voice PE works decently, it is the only off-the-shelf option for Home Assistant which considering all the freedom Home Assistant gives us, feels quite limiting. However, there is a solution.

The Solution

Thankfully we are not constrained by the limitations of existing hardware thanks to microcontrollers, specifically the ESP family of microcontrollers. ESPHome

Using ESPHome you can create a whole myriad of smart devices based on the ESP32 microcontroller. It provides a very diverse family of options that can fit nearly any use-case. Think of it as an alternative to Arduino, where instead of writing C code you can write yaml configuration files that dictate and configure your ESP device.

Knowing this, I set out to make my own Smart Speaker.

Building a Katchi (smart speaker)

So what does it take to make a smart speaker? You ultimately need a few key things such as a Speaker, a Microphone and a Wifi-enabled Microcontroller. For my purposes I decided I also wanted a screen so I could give my Katchi a little more personality. The main requirement I had for the display was for it to be circular as my intent was to use the display as the eye for my smart speaker.

Waveshare ESP32-S3

I ended up landing on the Waveshare ESP32-S3 1.75inch AMOLED Round Touch Display Development Board. Despite being quite a mouthful, this handy little device is absolutely packed with sensors and features, as well as a glorious AMOLED round display.

{{< gallery >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-1.jpg" alt="Gallery image 1" figureClass="grid-w33" >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-2.jpg" alt="Gallery image 2" figureClass="grid-w33" >}} {{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-3.jpg" alt="Gallery image 3" figureClass="grid-w33" >}} {{< /gallery >}}

Some of the fundamental things to take note of when considering esp devices, is what components and associated drivers are available in ESPHome. For the Waveshare device I picked, it has the following components and their support:

Device Purpose Supported
TCA9554 GPIO expander for additional interfaces Yes
ES7210 ADC for microphones Yes
ES8311 DAC for speaker Yes
CO5300 Display controller for amoled Yes
CST9217 Touchscreen control device No

To fill in the support gap for the touch screen, we can make use of an external driver to handle making the touchscreen work.

Configuration

Since ESPHome uses a yaml configuration file to define the device, configuring the device is fairly straightforward.

Basic Configuration

We first need to start with the basic confguration of the device. When setting up the esp32 configuration, it is crucial to be aware of the Flash size and CPU frequency otherwise your device may not run correctly. For the device I am using, it has a 16MB flash and a 240MHz CPU. We also use the esp-idf framework. This is the preferred framework for esp devices as the Arduino framework is not as feature-rich and is no longer supported by newer devices.

esphome:
  name: kobold
  friendly_name: Kobold

esp32:
  board: esp32-s3-devkitc-1
  flash_size: 16MB
  cpu_frequency: 240MHZ
  framework:
    type: esp-idf

We also need to set up psram, the i2c bus as well as the SPI bus. This will require a firm understanding of the GPIO pins and their associated functions. For the Waveshare device, they provide the following pinout diagram: ESP32-S3-Touch-AMOLED-1.75.pdf We will rely on this document extensively for the rest of the configuration.

The psram is important for making sure that the device does not run out of memory. and its config is quite simple.

psram:
  mode: octal
  speed: 80MHz

The i2c bus is used for the touchscreen and for other future components and acts as an important communication protocol for Microcontrollers as it allows a large amount of sensors and devices to connect to the same bus.

i2c:
  sda: GPIO15
  scl: GPIO14
  scan: true
  id: bus_a

The SPI bus is essential for the display module as it communicates via quad SPI which is functionally a quad channel serial bus.

spi:
  - id: spi_bus
    clk_pin: GPIO2
    mosi_pin: GPIO1
    miso_pin:
      number: GPIO3
      ignore_strapping_warning: true
  - id: quad_spi_bus
    type: quad
    clk_pin: GPIO38
    data_pins:
      - GPIO4
      - GPIO5
      - GPIO6
      - GPIO7

Display Configuration

For our basic display configuration we will use the mipi_spi display driver. This driver specifically requires the quad SPI bus to be configured as well as the correct data_rate. You can play with the data rate for a quad SPI display as it will impact how the display refreshes and draws images.

display:
  - platform: mipi_spi
    id: disp1
    model: CO5300
    bus_mode: quad
    reset_pin: GPIO39
    cs_pin: GPIO12
    data_rate: 80MHz
    dimensions:
      height: 466
      width: 466
      offset_width: 6

Audio Configuration

Our audio configuration is quite a bit more complex. It requires we configure its own SPI bus as well as the DAC and ADC configs. Finally we then need to actually configure the audio components.

Our audio SPI bus is more simple than our quad SPI bus

we ignore the strapping pin here to prevent warnings being thrown. Read more about this here

i2s_audio:
  - id: i2s_audio_bus
    i2s_mclk_pin: GPIO42
    i2s_bclk_pin: GPIO9
    i2s_lrclk_pin:
      number: GPIO45
      ignore_strapping_warning: true

We then need to configure both our DAC and ADC drivers. For the ease of syncing our configs and not confusing changes in the future, we will first add substitutions.

substitutions:
  i2s_bps_spk: 16bit
  i2s_bps_mic: 16bit
  i2s_sample_rate_spk: 44100
  i2s_sample_rate_mic: 16000

We can then configure our ADC and DAC drivers and make use of these substitutions.

audio_adc:
  - platform: es7210
    id: es7210_adc
    bits_per_sample: $i2s_bps_mic
    sample_rate: $i2s_sample_rate_mic

audio_dac:
  - platform: es8311
    id: es8311_dac
    bits_per_sample: $i2s_bps_spk
    sample_rate: $i2s_sample_rate_spk

Once we have our audio drivers configured, we can configure our audio output and audio input devices. We configure our audio devices using the same substitutions allowing us to change sample rates and bit depths without a possible mismatch between driver and device.

microphone:
  - platform: i2s_audio
    id: box_mic
    sample_rate: $i2s_sample_rate_mic
    i2s_din_pin: GPIO10
    bits_per_sample: $i2s_bps_mic
    adc_type: external

speaker:
  - platform: i2s_audio
    id: box_speaker
    i2s_dout_pin: GPIO8
    dac_type: external
    sample_rate: $i2s_sample_rate_spk
    bits_per_sample: $i2s_bps_spk
    audio_dac: es8311_dac
    buffer_duration: 90ms
    use_apll: true

All together we end up with a long block of configuration that looks like this:

i2s_audio:
  - id: i2s_audio_bus
    i2s_mclk_pin: GPIO42
    i2s_bclk_pin: GPIO9
    i2s_lrclk_pin:
      number: GPIO45
      ignore_strapping_warning: true

audio_adc:
  - platform: es7210
    id: es7210_adc
    bits_per_sample: $i2s_bps_mic
    sample_rate: $i2s_sample_rate_mic

audio_dac:
  - platform: es8311
    id: es8311_dac
    bits_per_sample: $i2s_bps_spk
    sample_rate: $i2s_sample_rate_spk

microphone:
  - platform: i2s_audio
    id: box_mic
    sample_rate: $i2s_sample_rate_mic
    i2s_din_pin: GPIO10
    bits_per_sample: $i2s_bps_mic
    adc_type: external

speaker:
  - platform: i2s_audio
    id: box_speaker
    i2s_dout_pin: GPIO8
    dac_type: external
    sample_rate: $i2s_sample_rate_spk
    bits_per_sample: $i2s_bps_spk
    audio_dac: es8311_dac
    buffer_duration: 90ms
    use_apll: true

Final Configuration

There is a lot more config to go through, and I don't want to go over all of it in this blog, you can find all resources for the ESPHome portion of Katchi at my gitea repo.

{{< gitea server="https://git.toomuchtaco.net" repo="taco/voice-assistant" >}}

Designing a kobold