update readme

2026-06-12 20:10:19 -06:00 · 2026-04-14 23:07:58 -05:00
parent b3d9f0e369
commit 7b028e4420
1 changed files with 127 additions and 123 deletions
--- a/README.md
+++ b/README.md
@@ -1,25 +1,19 @@
 <div align="center">
-  <h1>🎙️ microWakeWord Nvidia Trainer</h1>
+  <h1>microWakeWord NVIDIA Docker Trainer UI</h1>
  <img src="https://github.com/user-attachments/assets/57e25705-04ae-434e-ba2b-21c4f87d9044" width="800" />
 </div>
-Train **microWakeWord** detection models using a simple **web-based recorder + trainer UI**, packaged in a Docker container.
+Train custom microWakeWord models in Docker with:
-No Jupyter notebooks required. No manual cell execution. Just record your voice (optional) and train.
+- uploaded personal voice samples
 - automatically generated Piper TTS samples
 - a browser-based trainer UI
 - live training logs in a popup console
 This project no longer records audio in the browser. The UI is now upload-first: users add their own audio files, the app validates or converts them, and training runs from the same page.
 ---
-<img width="100" height="44" alt="unraid_logo_black-339076895" src="https://github.com/user-attachments/assets/87351bed-3321-4a43-924f-fecf2e4e700f" />
+## Docker Image
 **microWakeWord_Trainer-Nvidia** is available in the **Unraid Community Apps** store.
 Install directly from the Unraid App Store with a one-click template.
 ---
 <img width="100" height="56" alt="unraid_logo_black-339076895" src="https://github.com/user-attachments/assets/bf959585-ae13-4b4d-ae62-4202a850d35a" />
 ### Pull the Docker Image
 ```bash
 docker pull ghcr.io/tatertotterson/microwakeword:latest
@@ -27,7 +21,7 @@ docker pull ghcr.io/tatertotterson/microwakeword:latest
 ---
-### Run the Container
+## Run The Container
 ```bash
 docker run -d \
@@ -37,133 +31,143 @@ docker run -d \
  ghcr.io/tatertotterson/microwakeword:latest
 ```
-**What these flags do:**
+What these flags do:
 - `--gpus all` → Enables GPU acceleration  
 - `-p 8888:8888` → Exposes the Recorder + Trainer WebUI  
 - `-v $(pwd):/data` → Persists all models, datasets, and cache  
---
+- `--gpus all` enables GPU acceleration
 - `-p 8888:8888` exposes the trainer UI
 - `-v $(pwd):/data` persists models, downloaded voices, datasets, and personal samples
-### Open the Recorder WebUI
+Then open:
 Open your browser and go to:
 👉 **http://localhost:8888**
 You’ll see the **microWakeWord Recorder & Trainer UI**.
 ---
 ## 🎤 Recording Voice Samples (Optional)
 Personal voice recordings are **optional**.
 - You may **record your own voice** for better accuracy  
 - Or simply **click “Train” without recording anything**
 If no recordings are present, training will proceed using **synthetic TTS samples only**.
 ### Remote systems (important)
 If you are running this on a **remote PC / server**, browser-based recording will not work unless:
 - You use a **reverse proxy** (HTTPS + mic permissions), **or**
 - You access the UI via **localhost** on the same machine
 Training itself works fine remotely — only recording requires local microphone access.
 ---
 ### 🎙️ Recording Flow
 1. Enter your wake word
 2. Test pronunciation with **Test TTS**
 3. Choose:
   - Number of speakers (e.g. family members)
   - Takes per speaker (default: 10)
 4. Click **Begin recording**
 5. Speak naturally — recording:
   - Starts when you talk
   - Stops automatically after silence
 6. Repeat for each speaker
 Files are saved automatically to:
 ```
 personal_samples/
  speaker01_take01.wav
  speaker01_take02.wav
  speaker02_take01.wav
  ...
 ```
 ---
 ## 🧠 Training Behavior (Important Notes)
 ### ⏬ First training run
 The **first time you click Train**, the system will download **large training datasets** (background noise, speech corpora, etc.).
 - This can take **several minutes**
 - This happens **only once**
 - Data is cached inside `/data`
 You **will NOT need to download these again** unless you delete `/data`.
 ---
 ### 🔁 Re-training is safe and incremental
 - You can train **multiple wake words** back-to-back
 - You do **NOT** need to clear any folders between runs
 - Old models are preserved in timestamped output directories
 - All required cleanup and reuse logic is handled automatically
 ---
 ## 📦 Output Files
 When training completes, you’ll get:
 - `<wake_word>.tflite` – quantized streaming model  
 - `<wake_word>.json` – ESPHome-compatible metadata  
 Both are saved under:
 ```text
-/data/output/
+http://localhost:8888
 ```
-Each run is placed in its own timestamped folder.
+---
 ## What The UI Does
 - Start a wake word session
 - Test TTS pronunciation
 - Upload one or many personal samples
 - Normalize uploads to `16 kHz / mono / 16-bit PCM WAV`
 - Train with or without personal samples
 - Show a popup console with live progress and logs
 Personal samples are optional. If none are uploaded, the trainer can still proceed with TTS-only data after confirmation.
 ---
-## 🎤 Optional: Personal Voice Samples (Advanced)
+## Personal Samples
-If you record personal samples:
+Accepted upload formats include:
 - They are automatically augmented
 - They are **up-weighted during training**
 - This significantly improves real-world accuracy
-No configuration required — detection is automatic.
+- WAV
 - MP3
 - M4A
 - FLAC
 - OGG
 - AAC
 - OPUS
 - WEBM
 The backend validates or converts uploads with `ffmpeg` and stores the normalized files in:
 ```text
 /data/personal_samples/
 ```
 Notes:
 - starting a new session does not clear personal samples
 - use the `Clear personal samples` button if you want to wipe them
 - any uploaded personal samples are automatically included in training
 ---
-## 🔄 Resetting Everything (Optional)
+## Language Support
-If you want a **completely clean slate**:
+The language selector is dynamic.
-Delete the /data folder
+- `en` is always available
 - non-English languages are populated from Piper voice metadata
 - when you train with a non-English language, the backend downloads all Piper ONNX voices for that selected language only
 - it does not pre-download every language
 - already-downloaded voices are reused on later runs
-Then restart the container.
+English stays on its existing dedicated generator model path. Non-English languages use the selected language's ONNX Piper voices.
-⚠️ This will:
+If the Piper catalog is unavailable, already-installed local voices can still be used.
 - Remove cached datasets
 - Require re-downloading training data
 - Delete trained models
 ---
-## 🙌 Credits
+## Training Behavior
-Built on top of the excellent  
+1. Enter the wake word
-**https://github.com/kahrendt/microWakeWord**
+2. Optionally test pronunciation
 3. Optionally upload personal samples
 4. Click `Start training`
 5. Watch the popup console for:
   - selected-language voice downloads when needed
   - sample generation progress
   - dataset setup
   - training progress and completion
-Huge thanks to the original authors ❤️
+The `Open console` button lets you reopen the log window after closing it.
 ---
 ## First Run Notes
 The first real training run may download large training assets into `/data`, such as:
 - Piper voices for the selected language
 - training datasets and background data
 - Python training environment dependencies
 These are reused later unless you delete `/data`.
 ---
 ## Output Files
 Successful runs produce:
 ```text
 /data/output/<wake_word>.tflite
 /data/output/<wake_word>.json
 ```
 If those files already exist, the trainer creates timestamped backups before replacing them.
 ---
 ## Resetting Everything
 If you want a clean slate, stop the container and remove the contents of your mounted `/data` directory.
 That will remove:
 - personal samples
 - downloaded Piper voices
 - cached datasets
 - training environments
 - trained models
 ---
 ## Notes
 - browser microphone recording has been removed
 - personal samples are optional
 - the server module is now `trainer_server.py`
 - the launcher script is still named `run_recorder.sh` for compatibility
 ---
 ## Credits
 Built on top of:
 - [microWakeWord](https://github.com/kahrendt/microWakeWord)
 - [piper-sample-generator](https://github.com/rhasspy/piper-sample-generator)