taco/microWakeWord-Trainer-Nvidia-Docker

mirror of https://github.com/TaterTotterson/microWakeWord-Trainer-Nvidia-Docker.git synced 2026-06-12 20:10:19 -06:00

Go to file

George Joseph cb81f7f02d Train from the command line

The files in the `cli` directory allow you to train wake words
from the command line without needing to use the Jupyter notebook
or a web browser.  Basically, the logic from the notebook has been
placed in separate shell scripts and python files wrapped by 3 high-level
scripts that do the following:

* setup_python_venv: Creates a Python virtual environment with all the
packages needed to train.  The venv is created in the container's /data
directory and is therefore stored on the host, not in the container's root
docker volume.

* setup_training_datasets: Downloads, extracts and converts the MIT RIR,
FMA, Audioset and Negative training reference datasets.  Also stored in /data.

* train_wake_word: Generates the wake word samples, augments them with the
audio from the training datasets, and finally runs the microwakeword training.
The resulting model tflite and json files are placed in the /data/output
directory.

See the README.md file for much more information.

2025-12-28 12:48:51 -07:00

cli

Train from the command line

2025-12-28 12:48:51 -07:00

dockerfile

Switch from nvidia/cuda to plain ubuntu:22.05 base image

2025-12-19 10:14:26 -07:00

LICENSE

Initial commit

2025-01-02 20:22:06 -06:00

microWakeWord_training_notebook.ipynb

piper speaking speeds

2025-12-22 20:03:18 -06:00

mmw.png

Add files via upload

2025-01-02 23:15:53 -06:00

README.md

Update README.md

2025-09-27 15:04:16 -05:00

requirements.txt

Switch from nvidia/cuda to plain ubuntu:22.05 base image

2025-12-19 10:14:26 -07:00

startup.sh

Add files via upload

2025-09-26 19:35:09 -05:00

README.md

microWakeWord Trainer Docker

🥔 MicroWakeWord Trainer – Tater Approved

✅ Tater Totterson tested & working on an NVIDIA RTX 3070 Laptop GPU (8 GB VRAM).
Easily train microWakeWord detection models with this pre-built Docker image and JupyterLab notebook.

🚀 Quick Start

Follow these steps to get up and running:

1️⃣ Pull the Pre-Built Docker Image

docker pull ghcr.io/tatertotterson/microwakeword:latest

2️⃣ Run the Docker Container

docker run --rm -it \
    --gpus all \
    -p 8888:8888 \
    -v $(pwd):/data \
    ghcr.io/tatertotterson/microwakeword:latest

What these flags do:

--gpus all → Enables GPU acceleration
-p 8888:8888 → Exposes JupyterLab on port 8888
-v $(pwd):/data → Saves your work in the current folder

3️⃣ Open JupyterLab

Visit http://localhost:8888 in your browser — the notebook UI will open.

4️⃣ Set Your Wake Word

At the top of the notebook, find this line:

TARGET_WORD = "hey_tater"  # Change this to your desired wake word

Change "hey_tater" to your desired wake word (phonetic spellings often work best).

5️⃣ Run the Notebook

Run all cells in the notebook. This process will:

Generate wake word samples
Train a detection model
Output a quantized .tflite model ready for on-device use

6️⃣ Retrieve the Trained Model & JSON

When training finishes, download links for both the .tflite model and its .json manifest will be displayed in the last cell.

🔄 Resetting to a Clean State

If you need to start fresh:

Delete the data folder that was mapped to your Docker container.
Restart the container using the steps above.
A fresh copy of the notebook will be placed into the data directory.

🙌 Credits

This project builds upon the excellent work of kahrendt/microWakeWord.
Huge thanks to the original authors for their contributions to the open-source community!

README.md Unescape Escape

microWakeWord Trainer Docker

🥔 MicroWakeWord Trainer – Tater Approved

🚀 Quick Start

1️⃣ Pull the Pre-Built Docker Image

2️⃣ Run the Docker Container

3️⃣ Open JupyterLab

4️⃣ Set Your Wake Word

5️⃣ Run the Notebook

6️⃣ Retrieve the Trained Model & JSON

🔄 Resetting to a Clean State

🙌 Credits

README.md