Files
microWakeWord-Trainer-Nvidi…/cli/Dockerfile
George Joseph cb81f7f02d Train from the command line
The files in the `cli` directory allow you to train wake words
from the command line without needing to use the Jupyter notebook
or a web browser.  Basically, the logic from the notebook has been
placed in separate shell scripts and python files wrapped by 3 high-level
scripts that do the following:

* setup_python_venv: Creates a Python virtual environment with all the
packages needed to train.  The venv is created in the container's /data
directory and is therefore stored on the host, not in the container's root
docker volume.

* setup_training_datasets: Downloads, extracts and converts the MIT RIR,
FMA, Audioset and Negative training reference datasets.  Also stored in /data.

* train_wake_word: Generates the wake word samples, augments them with the
audio from the training datasets, and finally runs the microwakeword training.
The resulting model tflite and json files are placed in the /data/output
directory.

See the README.md file for much more information.
2025-12-28 12:48:51 -07:00

28 lines
1.3 KiB
Docker

# Since this is a pure python environment, we don't need to start
# with a huge CUDA image. A standard Ubuntu image will do.
FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_ROOT_USER_ACTION=ignore \
HF_HUB_DISABLE_SYMLINKS_WARNING=1 \
PATH="/root/mww-scripts:${PATH}"
# System deps
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.12 python3.12-venv python3.12-dev python3-pip python-is-python3 \
git wget curl unzip ca-certificates nano less \
&& rm -rf /var/lib/apt/lists/* \
&& mkdir -p /data
COPY --chown=root:root --chmod=0755 .bashrc /root/
COPY --chown=root:root --chmod=0755 setup_* wake_word_sample* train_wake_word \
test_python cudainfo system_summary shell.functions requirements.txt /root/mww-scripts/
# Docker and Podman send the CMD a SIGTERM when you "stop" the container. Unfortunately, bash
# normally doesn't exit when it recieves a SIGTERM so docker/podman has to wait for the "stop"
# to timeout then SIGKILL the container.
# This little scriptlet causes bash to exit immediately when it receives the SIGTERM.
CMD ["/usr/bin/bash", "-c", "exec /usr/bin/bash --rcfile <(echo '[ -f ~/.bashrc ] && source ~/.bashrc ; trap exit SIGTERM ;')" ]