Files
microWakeWord-Trainer-Nvidi…/README.md
Tater Totterson 8f9d290baf Add Docker image pull instructions to README
Updated README to include Docker image instructions and installation details.
2026-01-22 20:14:09 -06:00

167 lines
4.1 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<div align="center">
<h1>🎙️ microWakeWord Nvidia Trainer & Recorder</h1>
<img width="1002" height="593" alt="Screenshot 2026-01-18 at 8 13 35AM" src="https://github.com/user-attachments/assets/e1411d8a-8638-4df8-992b-09a46c6e5ddc" />
</div>
Train **microWakeWord** detection models using a simple **web-based recorder + trainer UI**, packaged in a Docker container.
No Jupyter notebooks required. No manual cell execution. Just record your voice (optional) and train.
---
<img width="100" height="44" alt="unraid_logo_black-339076895" src="https://github.com/user-attachments/assets/87351bed-3321-4a43-924f-fecf2e4e700f" />
**microWakeWord_Trainer-Nvidia** is available in the **Unraid Community Apps** store.
Intall directly from the Unraid App Store with a one-click template.
### 🚀 Docker
### 1⃣ Pull the Docker Image
```bash
docker pull ghcr.io/tatertotterson/microwakeword:latest
```
---
### 2⃣ Run the Container
```bash
docker run --rm -it \
--gpus all \
-p 8888:8888 \
-v $(pwd):/data \
ghcr.io/tatertotterson/microwakeword:latest
```
**What these flags do:**
- `--gpus all` → Enables GPU acceleration
- `-p 8888:8888` → Exposes the Recorder + Trainer WebUI
- `-v $(pwd):/data` → Persists all models, datasets, and cache
---
### 3⃣ Open the Recorder WebUI
Open your browser and go to:
👉 **http://localhost:8888**
Youll see the **microWakeWord Recorder & Trainer UI**.
---
## 🎤 Recording Voice Samples (Optional)
Personal voice recordings are **optional**.
- You may **record your own voice** for better accuracy
- Or simply **click “Train” without recording anything**
If no recordings are present, training will proceed using **synthetic TTS samples only**.
### Remote systems (important)
If you are running this on a **remote PC / server**, browser-based recording will not work unless:
- You use a **reverse proxy** (HTTPS + mic permissions), **or**
- You access the UI via **localhost** on the same machine
Training itself works fine remotely — only recording requires local microphone access.
---
### 🎙️ Recording Flow
1. Enter your wake word
2. Test pronunciation with **Test TTS**
3. Choose:
- Number of speakers (e.g. family members)
- Takes per speaker (default: 10)
4. Click **Begin recording**
5. Speak naturally — recording:
- Starts when you talk
- Stops automatically after silence
6. Repeat for each speaker
Files are saved automatically to:
```
personal_samples/
speaker01_take01.wav
speaker01_take02.wav
speaker02_take01.wav
...
```
---
## 🧠 Training Behavior (Important Notes)
### ⏬ First training run
The **first time you click Train**, the system will download **large training datasets** (background noise, speech corpora, etc.).
- This can take **several minutes**
- This happens **only once**
- Data is cached inside `/data`
You **will NOT need to download these again** unless you delete `/data`.
---
### 🔁 Re-training is safe and incremental
- You can train **multiple wake words** back-to-back
- You do **NOT** need to clear any folders between runs
- Old models are preserved in timestamped output directories
- All required cleanup and reuse logic is handled automatically
---
## 📦 Output Files
When training completes, youll get:
- `<wake_word>.tflite` quantized streaming model
- `<wake_word>.json` ESPHome-compatible metadata
Both are saved under:
```text
/data/output/
```
Each run is placed in its own timestamped folder.
---
## 🎤 Optional: Personal Voice Samples (Advanced)
If you record personal samples:
- They are automatically augmented
- They are **up-weighted during training**
- This significantly improves real-world accuracy
No configuration required — detection is automatic.
---
## 🔄 Resetting Everything (Optional)
If you want a **completely clean slate**:
Delete the /data folder
Then restart the container.
⚠️ This will:
- Remove cached datasets
- Require re-downloading training data
- Delete trained models
---
## 🙌 Credits
Built on top of the excellent
**https://github.com/kahrendt/microWakeWord**
Huge thanks to the original authors ❤️