From afe2153413d52568ecf124611138e78fe659d321 Mon Sep 17 00:00:00 2001 From: taco Date: Mon, 18 May 2026 18:25:30 -0600 Subject: [PATCH] More updates to my first blog post. --- content/posts/katchi/index.md | 265 ++++++++++++++++++++++++++++++++-- 1 file changed, 255 insertions(+), 10 deletions(-) diff --git a/content/posts/katchi/index.md b/content/posts/katchi/index.md index c2528d8..d25ad97 100644 --- a/content/posts/katchi/index.md +++ b/content/posts/katchi/index.md @@ -20,8 +20,8 @@ loop=true ### Smart-homes -The present state of smart-home choices is fairly acceptable. You have your major players, Google, Apple, Amazon and -their associated services like Google Home or Alexa. These systems are fairly easy to set up; plug in the new device, +The present state of smart-home choices is fairly acceptable. You have your major players, **Google**, **Apple**, **Amazon** and +their associated services like Google Home or **Alexa**. These systems are fairly easy to set up; plug in the new device, type in some credentials or type a prompt on your phone, and done. Most of these systems rely on a central hub that orchestrates the entire smart home. @@ -39,10 +39,10 @@ turning off my lights turned me away from major providers. After spending a lot of time frustrated with my options and dealing with the difficulties in automating and doing what I wanted with my smart-home, I went down the rabbit hole of options and found -[Home Assistant](https://www.home-assistant.io/). +**[Home Assistant](https://www.home-assistant.io/)**. ![Home Assistant](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fcommunity-assets.home-assistant.io%2Foriginal%2F4X%2F5%2F0%2Fe%2F50e585faea85010ebb16d3d466f071ef90ec1393.png&f=1&nofb=1&ipt=73955a250f2bf73ba578833607b6a377d67ea436a1562e35b202fb2273b3d35a) -Unlike the big-name smart-homes, Home Assistant is a self-hosted option that runs on your own hardware and locally +Unlike the big-name smart-homes, **Home Assistant** is a self-hosted option that runs on your own hardware and locally connects to supported devices. It supports a wide range of [devices and integrations](https://www.home-assistant.io/integrations/?brands=featured) and is fairly easy to set up. @@ -51,26 +51,271 @@ and [community](https://community.home-assistant.io/) for more information. ### So what's the problem? -Of all the amazing options that Home Assistant gives us, it has a fairly significant miss; that being Smart Speaker integration. +Of all the amazing options that **Home Assistant** gives us, it has a fairly significant miss; that being Smart Speaker integration. ## Home Assistant Smart Speaker -The options for Home Assistant smart speakers are quite limited, they only offer one official product as of the date of publishing this post. +The options for **Home Assistant** smart speakers are quite limited, they only offer one official product as of the date of publishing this post. {{< externalLink url="https://www.home-assistant.io/voice-pe/" >}} -While the Home Assistant Voice PE works decently, it is the only off-the-shelf option for Home Assistant which considering -all the freedom Home Assistant gives us, feels quite limiting. However, there is a solution. +While the **Home Assistant Voice PE** works decently, it is the only off-the-shelf option for **Home Assistant** which considering +all the freedom **Home Assistant** gives us, feels quite limiting. However, there is a solution. ### The Solution Thankfully we are not constrained by the limitations of existing hardware thanks to microcontrollers, specifically the ESP family of microcontrollers. ![ESPHome](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fesphome.io%2Fimages%2Fog.webp&f=1&nofb=1&ipt=fd9bfb5ff1845d2803627ce6224161f86267d81a0f426d077cbae7deaeb75215) -Using [ESPHome](https://esphome.io) you can create a whole myriad of smart devices based on the [ESP32 microcontroller](https://www.espressif.com/en/products/socs/esp32). It provides a very diverse +Using **[ESPHome](https://esphome.io)** you can create a whole myriad of smart devices based on the [ESP32 microcontroller](https://www.espressif.com/en/products/socs/esp32). It provides a very diverse family of options that can fit nearly any use-case. Think of it as an alternative to Arduino, where instead of writing C code you can write yaml configuration files that dictate and configure your ESP device. Knowing this, I set out to make my own Smart Speaker. -## Katchi the Kobold Smart Speaker \ No newline at end of file +## Building a Katchi (smart speaker) + +So what does it take to make a smart speaker? You ultimately need a few key things such as a Speaker, a Microphone and a +Wifi-enabled Microcontroller. For my purposes I decided I also wanted a screen so I could give my Katchi a little +more personality. The main requirement I had for the display was for it to be circular as my intent was to use the display +as the eye for my smart speaker. + +### Waveshare ESP32-S3 + +I ended up landing on the [Waveshare ESP32-S3 1.75inch AMOLED Round Touch Display Development Board](https://www.waveshare.com/esp32-s3-touch-amoled-1.75.htm). +Despite being quite a mouthful, this handy little device is absolutely packed with sensors and features, as well as a +glorious AMOLED round display. + +{{< gallery >}} +{{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-1.jpg" alt="Gallery image 1" figureClass="grid-w33" >}} +{{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-2.jpg" alt="Gallery image 2" figureClass="grid-w33" >}} +{{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-3.jpg" alt="Gallery image 3" figureClass="grid-w33" >}} +{{< /gallery >}} + +Some of the fundamental things to take note of when considering esp devices, is what components and associated drivers +are available in **ESPHome**. For the Waveshare device I picked, it has the following components and their support: + +| Device | Purpose | Supported | +|---------|-----------------------------------------|--------------------------------------------------------| +| TCA9554 | GPIO expander for additional interfaces | [Yes](https://esphome.io/components/pca9554/) | +| ES7210 | ADC for microphones | [Yes](https://esphome.io/components/audio_adc/es7210/) | +| ES8311 | DAC for speaker | [Yes](https://esphome.io/components/audio_dac/es8311/) | +| CO5300 | Display controller for amoled | [Yes](https://esphome.io/components/display/mipi_spi/) | +| CST9217 | Touchscreen control device | No | + +To fill in the support gap for the touch screen, we can make use of an [external driver](https://github.com/shelson/esphome-cst9217) to handle making the touchscreen +work. + +### Configuration + +Since **ESPHome** uses a yaml configuration file to define the device, configuring the device is fairly straightforward. + +#### Basic Configuration + +We first need to start with the basic confguration of the device. When setting up the esp32 configuration, it is crucial +to be aware of the Flash size and CPU frequency otherwise your device may not run correctly. For the device I am using, +it has a **16MB flash** and a **240MHz CPU**. We also use the esp-idf framework. This is the preferred framework for +esp devices as the Arduino framework is not as feature-rich and is no longer supported by newer devices. + +```yaml +esphome: + name: kobold + friendly_name: Kobold + +esp32: + board: esp32-s3-devkitc-1 + flash_size: 16MB + cpu_frequency: 240MHZ + framework: + type: esp-idf +``` + +We also need to set up **psram**, the **i2c bus** as well as the **SPI bus**. This will require +a firm understanding of the GPIO pins and their associated functions. For the Waveshare device, +they provide the following pinout diagram: [ESP32-S3-Touch-AMOLED-1.75.pdf](https://files.waveshare.com/wiki/ESP32-S3-Touch-AMOLED-1.75/ESP32-S3-Touch-AMOLED-1.75.pdf) +We will rely on this document extensively for the rest of the configuration. + +The **psram** is important for making sure that the device does not run out of memory. +and its config is quite simple. + +```yaml +psram: + mode: octal + speed: 80MHz +``` + +The **i2c bus** is used for the touchscreen and for other future components and acts as +an important communication protocol for Microcontrollers as it allows a large amount of +sensors and devices to connect to the same bus. + +```yaml +i2c: + sda: GPIO15 + scl: GPIO14 + scan: true + id: bus_a +``` + +The **SPI bus** is essential for the display module as it communicates via quad SPI which is functionally +a quad channel serial bus. + +```yaml +spi: + - id: spi_bus + clk_pin: GPIO2 + mosi_pin: GPIO1 + miso_pin: + number: GPIO3 + ignore_strapping_warning: true + - id: quad_spi_bus + type: quad + clk_pin: GPIO38 + data_pins: + - GPIO4 + - GPIO5 + - GPIO6 + - GPIO7 +``` + +#### Display Configuration + +For our basic display configuration we will use the **mipi_spi** display driver. This driver +specifically requires the **quad SPI bus** to be configured as well as the correct `data_rate`. +You can play with the data rate for a **quad SPI display** as it will impact how the display refreshes and draws images. +```yaml +display: + - platform: mipi_spi + id: disp1 + model: CO5300 + bus_mode: quad + reset_pin: GPIO39 + cs_pin: GPIO12 + data_rate: 80MHz + dimensions: + height: 466 + width: 466 + offset_width: 6 +``` + +#### Audio Configuration + +Our audio configuration is quite a bit more complex. It requires we configure its own **SPI bus** as well as the **DAC** +and **ADC** configs. Finally we then need to actually configure the audio components. + +Our **audio SPI bus** is more simple than our **quad SPI bus** + +> we ignore the strapping pin here to prevent warnings being thrown. Read more about this [here](https://esphome.io/guides/configuration-types/#pin-schema) + +```yaml +i2s_audio: + - id: i2s_audio_bus + i2s_mclk_pin: GPIO42 + i2s_bclk_pin: GPIO9 + i2s_lrclk_pin: + number: GPIO45 + ignore_strapping_warning: true +``` + +We then need to configure both our **DAC** and **ADC** drivers. For the ease of syncing our configs and not confusing changes +in the future, we will first add substitutions. + +```yaml +substitutions: + i2s_bps_spk: 16bit + i2s_bps_mic: 16bit + i2s_sample_rate_spk: 44100 + i2s_sample_rate_mic: 16000 +``` + +We can then configure our **ADC** and **DAC** drivers and make use of these substitutions. + +```yaml +audio_adc: + - platform: es7210 + id: es7210_adc + bits_per_sample: $i2s_bps_mic + sample_rate: $i2s_sample_rate_mic + +audio_dac: + - platform: es8311 + id: es8311_dac + bits_per_sample: $i2s_bps_spk + sample_rate: $i2s_sample_rate_spk +``` + +Once we have our audio drivers configured, we can configure our **audio output** and **audio input** devices. We configure +our audio devices using the same substitutions allowing us to change sample rates and bit depths without a possible +mismatch between driver and device. + +```yaml +microphone: + - platform: i2s_audio + id: box_mic + sample_rate: $i2s_sample_rate_mic + i2s_din_pin: GPIO10 + bits_per_sample: $i2s_bps_mic + adc_type: external + +speaker: + - platform: i2s_audio + id: box_speaker + i2s_dout_pin: GPIO8 + dac_type: external + sample_rate: $i2s_sample_rate_spk + bits_per_sample: $i2s_bps_spk + audio_dac: es8311_dac + buffer_duration: 90ms + use_apll: true +``` + +All together we end up with a long block of configuration that looks like this: + +```yaml +i2s_audio: + - id: i2s_audio_bus + i2s_mclk_pin: GPIO42 + i2s_bclk_pin: GPIO9 + i2s_lrclk_pin: + number: GPIO45 + ignore_strapping_warning: true + +audio_adc: + - platform: es7210 + id: es7210_adc + bits_per_sample: $i2s_bps_mic + sample_rate: $i2s_sample_rate_mic + +audio_dac: + - platform: es8311 + id: es8311_dac + bits_per_sample: $i2s_bps_spk + sample_rate: $i2s_sample_rate_spk + +microphone: + - platform: i2s_audio + id: box_mic + sample_rate: $i2s_sample_rate_mic + i2s_din_pin: GPIO10 + bits_per_sample: $i2s_bps_mic + adc_type: external + +speaker: + - platform: i2s_audio + id: box_speaker + i2s_dout_pin: GPIO8 + dac_type: external + sample_rate: $i2s_sample_rate_spk + bits_per_sample: $i2s_bps_spk + audio_dac: es8311_dac + buffer_duration: 90ms + use_apll: true +``` + +#### Final Configuration + +There is a lot more config to go through, and I don't want to go over all of it in this blog, you can find all resources +for the ESPHome portion of Katchi at my gitea repo. + +{{< gitea server="https://git.toomuchtaco.net" repo="taco/voice-assistant" >}} + +### Designing a kobold \ No newline at end of file