mirror of
https://github.com/TaterTotterson/microWakeWord-Trainer-Nvidia-Docker.git
synced 2026-06-12 20:10:19 -06:00
249 lines
7.5 KiB
Markdown
249 lines
7.5 KiB
Markdown
<p align="center">
|
|
<img
|
|
src="https://github.com/user-attachments/assets/cfcb4fe2-93c5-4e58-b90d-ca2ffce310d1"
|
|
width="300"
|
|
/>
|
|
</p>
|
|
<h3 align="center">
|
|
<a href="https://taterassistant.com">taterassistant.com</a>
|
|
</h3>
|
|
|
|
Train custom microWakeWord models in Docker with NVIDIA/CUDA acceleration, generated Piper samples, device-captured samples, reviewed false-wake negatives, live training logs, and ESPHome firmware flashing.
|
|
|
|
Real samples come from device-captured wake audio, close misses, or manual uploads. Every saved sample is normalized to `16 kHz / mono / 16-bit PCM WAV` before training.
|
|
|
|
---
|
|
|
|
## Docker Image
|
|
|
|
```bash
|
|
docker pull ghcr.io/tatertotterson/microwakeword:latest
|
|
```
|
|
|
|
---
|
|
|
|
## Run The Container
|
|
|
|
```bash
|
|
docker run -d \
|
|
--gpus all \
|
|
--network host \
|
|
-e REC_PORT=8789 \
|
|
-v $(pwd):/data \
|
|
ghcr.io/tatertotterson/microwakeword:latest
|
|
```
|
|
|
|
The flags:
|
|
|
|
- `--gpus all` enables GPU acceleration.
|
|
- `--network host` lets the container receive mDNS/zeroconf traffic for ESPHome auto-detect.
|
|
- `-e REC_PORT=8789` sets the trainer web UI and captured-audio port. Change this value if `8789` is already in use.
|
|
- `-v $(pwd):/data` persists models, downloaded voices, datasets, samples, and firmware caches.
|
|
|
|
Host networking is recommended for the Firmware tab's mDNS device discovery. Manual IP flashing and captured-audio uploads can still work without host networking if the trainer port is reachable, but auto-detect may not see devices from Docker bridge networking.
|
|
|
|
Open:
|
|
|
|
```text
|
|
http://localhost:8789
|
|
```
|
|
|
|
If you change `REC_PORT`, open that port instead and use the same port in the ESPHome `Trainer App URL`.
|
|
|
|
---
|
|
|
|
## What The UI Does
|
|
|
|
- `Trainer` starts a wake-word session, shows positive/negative sample counts, and launches training.
|
|
- `Captured Audio` reviews clips sent by ESPHome sats, including wake hits, close misses, and false wakes.
|
|
- `Samples` plays, removes, clears, and manually imports personal or negative samples.
|
|
- `Firmware` builds the latest `microWakeWords` ESPHome YAMLs from GitHub and flashes VoicePE or Satellite1 over OTA.
|
|
- Popup consoles show colorized training and firmware logs while long-running jobs are active.
|
|
|
|
---
|
|
|
|
## Captured Audio Workflow
|
|
|
|
To collect samples from a sat, flash it with the Tater firmware from [TaterTotterson/microWakeWords](https://github.com/TaterTotterson/microWakeWords). The `Firmware` tab can build and flash the VoicePE or Satellite1 YAMLs directly from that repo.
|
|
|
|
After flashing, the device exposes ESPHome entities for capture setup:
|
|
|
|
- `Capture Wake Audio` toggles upload of wake-word triggers.
|
|
- `Capture Close Misses` toggles upload of near misses.
|
|
- `Trainer App URL` sets the trainer address, for example `http://<trainer-ip>:8789`.
|
|
|
|
ESPHome devices can send raw captured audio to:
|
|
|
|
```text
|
|
/api/upload_captured_audio_raw
|
|
```
|
|
|
|
Keep the training app running and reachable at the `Trainer App URL` while capture is enabled. The sats upload clips live; if the app is stopped or the URL is wrong, captured audio will not be saved.
|
|
|
|
In the `Captured Audio` tab:
|
|
|
|
- play each clip from the inbox
|
|
- mark good wake-word clips as `This is good`
|
|
- mark bad triggers as `False wake`
|
|
- discard clips that should not be used
|
|
|
|
Approved clips move into:
|
|
|
|
```text
|
|
/data/personal_samples/
|
|
```
|
|
|
|
False wakes move into:
|
|
|
|
```text
|
|
/data/negative_samples/
|
|
```
|
|
|
|
Captured audio is boosted for easier playback in the UI, then kept in the correct training format.
|
|
|
|
---
|
|
|
|
## Samples
|
|
|
|
The `Samples` tab is the sample library.
|
|
|
|
- `Personal` samples are positive examples of the wake word.
|
|
- `Negative` samples are reviewed false wakes or hard negatives.
|
|
- Both can be played back and removed one at a time.
|
|
- Manual upload is available here as an optional seed path.
|
|
|
|
Accepted manual upload formats include:
|
|
|
|
- WAV
|
|
- MP3
|
|
- M4A
|
|
- FLAC
|
|
- OGG
|
|
- AAC
|
|
- OPUS
|
|
- WEBM
|
|
|
|
Uploads are validated or converted with `ffmpeg` into:
|
|
|
|
```text
|
|
16 kHz / mono / 16-bit PCM WAV
|
|
```
|
|
|
|
Starting a new session does not clear samples. Use the clear buttons in `Samples` if you want to remove saved personal or negative clips.
|
|
|
|
---
|
|
|
|
## Training Flow
|
|
|
|
1. Enter the wake phrase in `Trainer`.
|
|
2. Choose the language.
|
|
3. Optionally test pronunciation with `Test TTS`.
|
|
4. Review the positive and negative sample counts.
|
|
5. Click `Start training`.
|
|
6. Watch the popup training console.
|
|
|
|
Personal samples are optional. Training can run with zero personal samples after confirmation, using generated TTS samples and the stock negative datasets.
|
|
|
|
Reviewed negative samples are converted into `/data/work/reviewed_negative_features/` and inserted into the training YAML as a hard-negative feature set when present.
|
|
|
|
---
|
|
|
|
## Language Support
|
|
|
|
The language picker is dynamic.
|
|
|
|
- `en` is always available.
|
|
- English keeps the existing dedicated generator model path.
|
|
- Non-English languages are discovered from the Piper voices catalog and any local Piper voice metadata.
|
|
- When a non-English language is selected, the trainer downloads all voices for that selected language only.
|
|
- Already-downloaded voices are reused.
|
|
- It does not download every language up front.
|
|
|
|
If the upstream Piper catalog is unavailable, already-installed local voices are used when available.
|
|
|
|
---
|
|
|
|
## Dataset Behavior
|
|
|
|
The first training run downloads and prepares missing training assets into `/data`, including:
|
|
|
|
- Piper voices for the selected language
|
|
- negative datasets and background data
|
|
- the Python training environment
|
|
- generated samples and augmented feature caches
|
|
|
|
After those assets are prepared, later runs reuse the local copies unless the mounted `/data` contents are deleted.
|
|
|
|
---
|
|
|
|
## Firmware Flashing
|
|
|
|
The `Firmware` tab builds and flashes Tater firmware for supported ESPHome sats.
|
|
|
|
- Downloads the latest firmware YAML templates from `TaterTotterson/microWakeWords` on GitHub.
|
|
- Lets you choose `VoicePE` or `Satellite1`.
|
|
- Auto-detects ESPHome devices with mDNS when the container is running with host networking.
|
|
- Allows manual IP or hostname entry if discovery does not find the device.
|
|
- Saves firmware form values so you do not re-enter sounds and URLs every run.
|
|
- Lists locally trained wake words from `/data/trained_wake_words/` for easy model selection.
|
|
- Builds with ESPHome and flashes OTA.
|
|
- Streams ESPHome output in a colorized firmware console.
|
|
|
|
Firmware YAMLs are intentionally pulled from GitHub each time. There is no local fallback path in the trainer UI.
|
|
|
|
---
|
|
|
|
## Output Files
|
|
|
|
Successful runs produce timestamped training output folders such as:
|
|
|
|
```text
|
|
/data/output/<timestamp>-<wake_word>-<samples>-<steps>/<wake_word>.tflite
|
|
/data/output/<timestamp>-<wake_word>-<samples>-<steps>/<wake_word>.json
|
|
```
|
|
|
|
The trainer also syncs firmware-ready artifacts into:
|
|
|
|
```text
|
|
/data/trained_wake_words/<wake_word>.tflite
|
|
/data/trained_wake_words/<wake_word>.json
|
|
```
|
|
|
|
The firmware tab uses `/data/trained_wake_words/` to populate the wake-word dropdown.
|
|
|
|
---
|
|
|
|
## Resetting Everything
|
|
|
|
If you want a clean slate, stop the container and remove the contents of the mounted `/data` directory.
|
|
|
|
That removes:
|
|
|
|
- personal samples
|
|
- negative samples
|
|
- captured inbox clips
|
|
- downloaded Piper voices
|
|
- cached datasets
|
|
- training environments
|
|
- trained models
|
|
- firmware build caches
|
|
|
|
---
|
|
|
|
## Important Notes
|
|
|
|
- Personal samples are optional.
|
|
- Negative samples are optional but useful for reducing false wakes.
|
|
- The UI server is `trainer_server.py`.
|
|
- The launcher is `run.sh`.
|
|
- Firmware capture settings live on the ESPHome device and can be toggled from the device entities after flashing.
|
|
|
|
---
|
|
|
|
## Credits
|
|
|
|
Built on top of:
|
|
|
|
- [microWakeWord](https://github.com/kahrendt/microWakeWord)
|
|
- [piper-sample-generator](https://github.com/rhasspy/piper-sample-generator)
|