taco/microWakeWord-Trainer-Nvidia-Docker

Fork 0

mirror of https://github.com/TaterTotterson/microWakeWord-Trainer-Nvidia-Docker.git synced 2026-06-12 20:10:19 -06:00

Files

MasterPhooey 7b028e4420 update readme

2026-04-14 23:07:58 -05:00

3.8 KiB

Raw Blame History

microWakeWord NVIDIA Docker Trainer UI

Train custom microWakeWord models in Docker with:

uploaded personal voice samples
automatically generated Piper TTS samples
a browser-based trainer UI
live training logs in a popup console

This project no longer records audio in the browser. The UI is now upload-first: users add their own audio files, the app validates or converts them, and training runs from the same page.

Docker Image

docker pull ghcr.io/tatertotterson/microwakeword:latest

Run The Container

docker run -d \
  --gpus all \
  -p 8888:8888 \
  -v $(pwd):/data \
  ghcr.io/tatertotterson/microwakeword:latest

What these flags do:

--gpus all enables GPU acceleration
-p 8888:8888 exposes the trainer UI
-v $(pwd):/data persists models, downloaded voices, datasets, and personal samples

Then open:

http://localhost:8888

What The UI Does

Start a wake word session
Test TTS pronunciation
Upload one or many personal samples
Normalize uploads to 16 kHz / mono / 16-bit PCM WAV
Train with or without personal samples
Show a popup console with live progress and logs

Personal samples are optional. If none are uploaded, the trainer can still proceed with TTS-only data after confirmation.

Personal Samples

Accepted upload formats include:

WAV
MP3
M4A
FLAC
OGG
AAC
OPUS
WEBM

The backend validates or converts uploads with ffmpeg and stores the normalized files in:

/data/personal_samples/

Notes:

starting a new session does not clear personal samples
use the Clear personal samples button if you want to wipe them
any uploaded personal samples are automatically included in training

Language Support

The language selector is dynamic.

en is always available
non-English languages are populated from Piper voice metadata
when you train with a non-English language, the backend downloads all Piper ONNX voices for that selected language only
it does not pre-download every language
already-downloaded voices are reused on later runs

English stays on its existing dedicated generator model path. Non-English languages use the selected language's ONNX Piper voices.

If the Piper catalog is unavailable, already-installed local voices can still be used.

Training Behavior

Enter the wake word
Optionally test pronunciation
Optionally upload personal samples
Click Start training
Watch the popup console for:
- selected-language voice downloads when needed
- sample generation progress
- dataset setup
- training progress and completion

The Open console button lets you reopen the log window after closing it.

First Run Notes

The first real training run may download large training assets into /data, such as:

Piper voices for the selected language
training datasets and background data
Python training environment dependencies

These are reused later unless you delete /data.

Output Files

Successful runs produce:

/data/output/<wake_word>.tflite
/data/output/<wake_word>.json

If those files already exist, the trainer creates timestamped backups before replacing them.

Resetting Everything

If you want a clean slate, stop the container and remove the contents of your mounted /data directory.

That will remove:

personal samples
downloaded Piper voices
cached datasets
training environments
trained models

Notes

browser microphone recording has been removed
personal samples are optional
the server module is now trainer_server.py
the launcher script is still named run_recorder.sh for compatibility

Credits

Built on top of:

3.8 KiB Raw Blame History