🎙️ microWakeWord Nvidia Trainer

Train **microWakeWord** detection models using a simple **web-based recorder + trainer UI**, packaged in a Docker container. No Jupyter notebooks required. No manual cell execution. Just record your voice (optional) and train. ---

**microWakeWord_Trainer-Nvidia** is available in the **Unraid Community Apps** store. Install directly from the Unraid App Store with a one-click template. --- unraid_logo_black-339076895

### Pull the Docker Image ```bash docker pull ghcr.io/tatertotterson/microwakeword:latest ``` --- ### Run the Container ```bash docker run -d \ --gpus all \ -p 8888:8888 \ -v $(pwd):/data \ ghcr.io/tatertotterson/microwakeword:latest ``` **What these flags do:** - `--gpus all` → Enables GPU acceleration - `-p 8888:8888` → Exposes the Recorder + Trainer WebUI - `-v $(pwd):/data` → Persists all models, datasets, and cache --- ### Open the Recorder WebUI Open your browser and go to: 👉 **http://localhost:8888** You’ll see the **microWakeWord Recorder & Trainer UI**. --- ## 🎤 Recording Voice Samples (Optional) Personal voice recordings are **optional**. - You may **record your own voice** for better accuracy - Or simply **click “Train” without recording anything** If no recordings are present, training will proceed using **synthetic TTS samples only**. ### Remote systems (important) If you are running this on a **remote PC / server**, browser-based recording will not work unless: - You use a **reverse proxy** (HTTPS + mic permissions), **or** - You access the UI via **localhost** on the same machine Training itself works fine remotely — only recording requires local microphone access. --- ### 🎙️ Recording Flow 1. Enter your wake word 2. Test pronunciation with **Test TTS** 3. Choose: - Number of speakers (e.g. family members) - Takes per speaker (default: 10) 4. Click **Begin recording** 5. Speak naturally — recording: - Starts when you talk - Stops automatically after silence 6. Repeat for each speaker Files are saved automatically to: ``` personal_samples/ speaker01_take01.wav speaker01_take02.wav speaker02_take01.wav ... ``` --- ## 🧠 Training Behavior (Important Notes) ### ⏬ First training run The **first time you click Train**, the system will download **large training datasets** (background noise, speech corpora, etc.). - This can take **several minutes** - This happens **only once** - Data is cached inside `/data` You **will NOT need to download these again** unless you delete `/data`. --- ### 🔁 Re-training is safe and incremental - You can train **multiple wake words** back-to-back - You do **NOT** need to clear any folders between runs - Old models are preserved in timestamped output directories - All required cleanup and reuse logic is handled automatically --- ## 📦 Output Files When training completes, you’ll get: - `.tflite` – quantized streaming model - `.json` – ESPHome-compatible metadata Both are saved under: ```text /data/output/ ``` Each run is placed in its own timestamped folder. --- ## 🎤 Optional: Personal Voice Samples (Advanced) If you record personal samples: - They are automatically augmented - They are **up-weighted during training** - This significantly improves real-world accuracy No configuration required — detection is automatic. --- ## 🔄 Resetting Everything (Optional) If you want a **completely clean slate**: Delete the /data folder Then restart the container. ⚠️ This will: - Remove cached datasets - Require re-downloading training data - Delete trained models --- ## 🙌 Credits Built on top of the excellent **https://github.com/kahrendt/microWakeWord** Huge thanks to the original authors ❤️