ποΈ microWakeWord Nvidia Trainer & Recorder
Train **microWakeWord** detection models using a simple **web-based recorder + trainer UI**, packaged in a Docker container.
No Jupyter notebooks required. No manual cell execution. Just record your voice (optional) and train.
---
**microWakeWord_Trainer-Nvidia** is available in the **Unraid Community Apps** store.
Install directly from the Unraid App Store with a one-click template.
---
### Pull the Docker Image
```bash
docker pull ghcr.io/tatertotterson/microwakeword:latest
```
---
### Run the Container
```bash
docker run -d \
--gpus all \
-p 8888:8888 \
-v $(pwd):/data \
ghcr.io/tatertotterson/microwakeword:latest
```
**What these flags do:**
- `--gpus all` β Enables GPU acceleration
- `-p 8888:8888` β Exposes the Recorder + Trainer WebUI
- `-v $(pwd):/data` β Persists all models, datasets, and cache
---
### Open the Recorder WebUI
Open your browser and go to:
π **http://localhost:8888**
Youβll see the **microWakeWord Recorder & Trainer UI**.
---
## π€ Recording Voice Samples (Optional)
Personal voice recordings are **optional**.
- You may **record your own voice** for better accuracy
- Or simply **click βTrainβ without recording anything**
If no recordings are present, training will proceed using **synthetic TTS samples only**.
### Remote systems (important)
If you are running this on a **remote PC / server**, browser-based recording will not work unless:
- You use a **reverse proxy** (HTTPS + mic permissions), **or**
- You access the UI via **localhost** on the same machine
Training itself works fine remotely β only recording requires local microphone access.
---
### ποΈ Recording Flow
1. Enter your wake word
2. Test pronunciation with **Test TTS**
3. Choose:
- Number of speakers (e.g. family members)
- Takes per speaker (default: 10)
4. Click **Begin recording**
5. Speak naturally β recording:
- Starts when you talk
- Stops automatically after silence
6. Repeat for each speaker
Files are saved automatically to:
```
personal_samples/
speaker01_take01.wav
speaker01_take02.wav
speaker02_take01.wav
...
```
---
## π§ Training Behavior (Important Notes)
### β¬ First training run
The **first time you click Train**, the system will download **large training datasets** (background noise, speech corpora, etc.).
- This can take **several minutes**
- This happens **only once**
- Data is cached inside `/data`
You **will NOT need to download these again** unless you delete `/data`.
---
### π Re-training is safe and incremental
- You can train **multiple wake words** back-to-back
- You do **NOT** need to clear any folders between runs
- Old models are preserved in timestamped output directories
- All required cleanup and reuse logic is handled automatically
---
## π¦ Output Files
When training completes, youβll get:
- `.tflite` β quantized streaming model
- `.json` β ESPHome-compatible metadata
Both are saved under:
```text
/data/output/
```
Each run is placed in its own timestamped folder.
---
## π€ Optional: Personal Voice Samples (Advanced)
If you record personal samples:
- They are automatically augmented
- They are **up-weighted during training**
- This significantly improves real-world accuracy
No configuration required β detection is automatic.
---
## π Resetting Everything (Optional)
If you want a **completely clean slate**:
Delete the /data folder
Then restart the container.
β οΈ This will:
- Remove cached datasets
- Require re-downloading training data
- Delete trained models
---
## π Credits
Built on top of the excellent
**https://github.com/kahrendt/microWakeWord**
Huge thanks to the original authors β€οΈ