More updates to my first blog post.
All checks were successful
Build and Publish Docker Image / build (push) Successful in 38s

This commit is contained in:
2026-05-18 18:25:30 -06:00
parent 58bf25f5fe
commit afe2153413

View File

@@ -20,8 +20,8 @@ loop=true
### Smart-homes
The present state of smart-home choices is fairly acceptable. You have your major players, Google, Apple, Amazon and
their associated services like Google Home or Alexa. These systems are fairly easy to set up; plug in the new device,
The present state of smart-home choices is fairly acceptable. You have your major players, **Google**, **Apple**, **Amazon** and
their associated services like Google Home or **Alexa**. These systems are fairly easy to set up; plug in the new device,
type in some credentials or type a prompt on your phone, and done. Most of these systems rely on a central hub that
orchestrates the entire smart home.
@@ -39,10 +39,10 @@ turning off my lights turned me away from major providers.
After spending a lot of time frustrated with my options and dealing with the difficulties in automating and doing what
I wanted with my smart-home, I went down the rabbit hole of options and found
[Home Assistant](https://www.home-assistant.io/).
**[Home Assistant](https://www.home-assistant.io/)**.
![Home Assistant](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fcommunity-assets.home-assistant.io%2Foriginal%2F4X%2F5%2F0%2Fe%2F50e585faea85010ebb16d3d466f071ef90ec1393.png&f=1&nofb=1&ipt=73955a250f2bf73ba578833607b6a377d67ea436a1562e35b202fb2273b3d35a)
Unlike the big-name smart-homes, Home Assistant is a self-hosted option that runs on your own hardware and locally
Unlike the big-name smart-homes, **Home Assistant** is a self-hosted option that runs on your own hardware and locally
connects to supported devices. It supports a wide range of [devices and integrations](https://www.home-assistant.io/integrations/?brands=featured)
and is fairly easy to set up.
@@ -51,26 +51,271 @@ and [community](https://community.home-assistant.io/) for more information.
### So what's the problem?
Of all the amazing options that Home Assistant gives us, it has a fairly significant miss; that being Smart Speaker integration.
Of all the amazing options that **Home Assistant** gives us, it has a fairly significant miss; that being Smart Speaker integration.
## Home Assistant Smart Speaker
The options for Home Assistant smart speakers are quite limited, they only offer one official product as of the date of publishing this post.
The options for **Home Assistant** smart speakers are quite limited, they only offer one official product as of the date of publishing this post.
{{< externalLink url="https://www.home-assistant.io/voice-pe/" >}}
While the Home Assistant Voice PE works decently, it is the only off-the-shelf option for Home Assistant which considering
all the freedom Home Assistant gives us, feels quite limiting. However, there is a solution.
While the **Home Assistant Voice PE** works decently, it is the only off-the-shelf option for **Home Assistant** which considering
all the freedom **Home Assistant** gives us, feels quite limiting. However, there is a solution.
### The Solution
Thankfully we are not constrained by the limitations of existing hardware thanks to microcontrollers, specifically the ESP family of microcontrollers.
![ESPHome](https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fesphome.io%2Fimages%2Fog.webp&f=1&nofb=1&ipt=fd9bfb5ff1845d2803627ce6224161f86267d81a0f426d077cbae7deaeb75215)
Using [ESPHome](https://esphome.io) you can create a whole myriad of smart devices based on the [ESP32 microcontroller](https://www.espressif.com/en/products/socs/esp32). It provides a very diverse
Using **[ESPHome](https://esphome.io)** you can create a whole myriad of smart devices based on the [ESP32 microcontroller](https://www.espressif.com/en/products/socs/esp32). It provides a very diverse
family of options that can fit nearly any use-case. Think of it as an alternative to Arduino, where instead of writing C code
you can write yaml configuration files that dictate and configure your ESP device.
Knowing this, I set out to make my own Smart Speaker.
## Katchi the Kobold Smart Speaker
## Building a Katchi (smart speaker)
So what does it take to make a smart speaker? You ultimately need a few key things such as a Speaker, a Microphone and a
Wifi-enabled Microcontroller. For my purposes I decided I also wanted a screen so I could give my Katchi a little
more personality. The main requirement I had for the display was for it to be circular as my intent was to use the display
as the eye for my smart speaker.
### Waveshare ESP32-S3
I ended up landing on the [Waveshare ESP32-S3 1.75inch AMOLED Round Touch Display Development Board](https://www.waveshare.com/esp32-s3-touch-amoled-1.75.htm).
Despite being quite a mouthful, this handy little device is absolutely packed with sensors and features, as well as a
glorious AMOLED round display.
{{< gallery >}}
{{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-1.jpg" alt="Gallery image 1" figureClass="grid-w33" >}}
{{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-2.jpg" alt="Gallery image 2" figureClass="grid-w33" >}}
{{< figure src="https://www.waveshare.com/media/catalog/product/cache/1/image/800x800/9df78eab33525d08d6e5fb8d27136e95/e/s/esp32-s3-touch-amoled-1.75-3.jpg" alt="Gallery image 3" figureClass="grid-w33" >}}
{{< /gallery >}}
Some of the fundamental things to take note of when considering esp devices, is what components and associated drivers
are available in **ESPHome**. For the Waveshare device I picked, it has the following components and their support:
| Device | Purpose | Supported |
|---------|-----------------------------------------|--------------------------------------------------------|
| TCA9554 | GPIO expander for additional interfaces | [Yes](https://esphome.io/components/pca9554/) |
| ES7210 | ADC for microphones | [Yes](https://esphome.io/components/audio_adc/es7210/) |
| ES8311 | DAC for speaker | [Yes](https://esphome.io/components/audio_dac/es8311/) |
| CO5300 | Display controller for amoled | [Yes](https://esphome.io/components/display/mipi_spi/) |
| CST9217 | Touchscreen control device | No |
To fill in the support gap for the touch screen, we can make use of an [external driver](https://github.com/shelson/esphome-cst9217) to handle making the touchscreen
work.
### Configuration
Since **ESPHome** uses a yaml configuration file to define the device, configuring the device is fairly straightforward.
#### Basic Configuration
We first need to start with the basic confguration of the device. When setting up the esp32 configuration, it is crucial
to be aware of the Flash size and CPU frequency otherwise your device may not run correctly. For the device I am using,
it has a **16MB flash** and a **240MHz CPU**. We also use the esp-idf framework. This is the preferred framework for
esp devices as the Arduino framework is not as feature-rich and is no longer supported by newer devices.
```yaml
esphome:
name: kobold
friendly_name: Kobold
esp32:
board: esp32-s3-devkitc-1
flash_size: 16MB
cpu_frequency: 240MHZ
framework:
type: esp-idf
```
We also need to set up **psram**, the **i2c bus** as well as the **SPI bus**. This will require
a firm understanding of the GPIO pins and their associated functions. For the Waveshare device,
they provide the following pinout diagram: [ESP32-S3-Touch-AMOLED-1.75.pdf](https://files.waveshare.com/wiki/ESP32-S3-Touch-AMOLED-1.75/ESP32-S3-Touch-AMOLED-1.75.pdf)
We will rely on this document extensively for the rest of the configuration.
The **psram** is important for making sure that the device does not run out of memory.
and its config is quite simple.
```yaml
psram:
mode: octal
speed: 80MHz
```
The **i2c bus** is used for the touchscreen and for other future components and acts as
an important communication protocol for Microcontrollers as it allows a large amount of
sensors and devices to connect to the same bus.
```yaml
i2c:
sda: GPIO15
scl: GPIO14
scan: true
id: bus_a
```
The **SPI bus** is essential for the display module as it communicates via quad SPI which is functionally
a quad channel serial bus.
```yaml
spi:
- id: spi_bus
clk_pin: GPIO2
mosi_pin: GPIO1
miso_pin:
number: GPIO3
ignore_strapping_warning: true
- id: quad_spi_bus
type: quad
clk_pin: GPIO38
data_pins:
- GPIO4
- GPIO5
- GPIO6
- GPIO7
```
#### Display Configuration
For our basic display configuration we will use the **mipi_spi** display driver. This driver
specifically requires the **quad SPI bus** to be configured as well as the correct `data_rate`.
You can play with the data rate for a **quad SPI display** as it will impact how the display refreshes and draws images.
```yaml
display:
- platform: mipi_spi
id: disp1
model: CO5300
bus_mode: quad
reset_pin: GPIO39
cs_pin: GPIO12
data_rate: 80MHz
dimensions:
height: 466
width: 466
offset_width: 6
```
#### Audio Configuration
Our audio configuration is quite a bit more complex. It requires we configure its own **SPI bus** as well as the **DAC**
and **ADC** configs. Finally we then need to actually configure the audio components.
Our **audio SPI bus** is more simple than our **quad SPI bus**
> we ignore the strapping pin here to prevent warnings being thrown. Read more about this [here](https://esphome.io/guides/configuration-types/#pin-schema)
```yaml
i2s_audio:
- id: i2s_audio_bus
i2s_mclk_pin: GPIO42
i2s_bclk_pin: GPIO9
i2s_lrclk_pin:
number: GPIO45
ignore_strapping_warning: true
```
We then need to configure both our **DAC** and **ADC** drivers. For the ease of syncing our configs and not confusing changes
in the future, we will first add substitutions.
```yaml
substitutions:
i2s_bps_spk: 16bit
i2s_bps_mic: 16bit
i2s_sample_rate_spk: 44100
i2s_sample_rate_mic: 16000
```
We can then configure our **ADC** and **DAC** drivers and make use of these substitutions.
```yaml
audio_adc:
- platform: es7210
id: es7210_adc
bits_per_sample: $i2s_bps_mic
sample_rate: $i2s_sample_rate_mic
audio_dac:
- platform: es8311
id: es8311_dac
bits_per_sample: $i2s_bps_spk
sample_rate: $i2s_sample_rate_spk
```
Once we have our audio drivers configured, we can configure our **audio output** and **audio input** devices. We configure
our audio devices using the same substitutions allowing us to change sample rates and bit depths without a possible
mismatch between driver and device.
```yaml
microphone:
- platform: i2s_audio
id: box_mic
sample_rate: $i2s_sample_rate_mic
i2s_din_pin: GPIO10
bits_per_sample: $i2s_bps_mic
adc_type: external
speaker:
- platform: i2s_audio
id: box_speaker
i2s_dout_pin: GPIO8
dac_type: external
sample_rate: $i2s_sample_rate_spk
bits_per_sample: $i2s_bps_spk
audio_dac: es8311_dac
buffer_duration: 90ms
use_apll: true
```
All together we end up with a long block of configuration that looks like this:
```yaml
i2s_audio:
- id: i2s_audio_bus
i2s_mclk_pin: GPIO42
i2s_bclk_pin: GPIO9
i2s_lrclk_pin:
number: GPIO45
ignore_strapping_warning: true
audio_adc:
- platform: es7210
id: es7210_adc
bits_per_sample: $i2s_bps_mic
sample_rate: $i2s_sample_rate_mic
audio_dac:
- platform: es8311
id: es8311_dac
bits_per_sample: $i2s_bps_spk
sample_rate: $i2s_sample_rate_spk
microphone:
- platform: i2s_audio
id: box_mic
sample_rate: $i2s_sample_rate_mic
i2s_din_pin: GPIO10
bits_per_sample: $i2s_bps_mic
adc_type: external
speaker:
- platform: i2s_audio
id: box_speaker
i2s_dout_pin: GPIO8
dac_type: external
sample_rate: $i2s_sample_rate_spk
bits_per_sample: $i2s_bps_spk
audio_dac: es8311_dac
buffer_duration: 90ms
use_apll: true
```
#### Final Configuration
There is a lot more config to go through, and I don't want to go over all of it in this blog, you can find all resources
for the ESPHome portion of Katchi at my gitea repo.
{{< gitea server="https://git.toomuchtaco.net" repo="taco/voice-assistant" >}}
### Designing a kobold