diff --git a/README.md b/README.md index 5cc8344..3c86f2c 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,11 @@

microWakeWord NVIDIA Docker Trainer UI

- Screenshot 2026-04-14 at 11 02 06 PM + microWakeWord NVIDIA trainer screenshot
-Train custom microWakeWord models in Docker with: +Train custom microWakeWord models in Docker with NVIDIA/CUDA acceleration, generated Piper samples, device-captured samples, reviewed false-wake negatives, live training logs, and ESPHome firmware flashing. -- uploaded personal voice samples -- automatically generated Piper TTS samples -- a browser-based trainer UI -- live training logs in a popup console - -This project no longer records audio in the browser. The UI is now upload-first: users add their own audio files, the app validates or converts them, and training runs from the same page. +Real samples come from device-captured wake audio, close misses, or manual uploads. Every saved sample is normalized to `16 kHz / mono / 16-bit PCM WAV` before training. --- @@ -27,41 +22,86 @@ docker pull ghcr.io/tatertotterson/microwakeword:latest ```bash docker run -d \ --gpus all \ - -p 8888:8888 \ + -p 8789:8789 \ -v $(pwd):/data \ ghcr.io/tatertotterson/microwakeword:latest ``` -What these flags do: +The flags: -- `--gpus all` enables GPU acceleration -- `-p 8888:8888` exposes the trainer UI -- `-v $(pwd):/data` persists models, downloaded voices, datasets, and personal samples +- `--gpus all` enables GPU acceleration. +- `-p 8789:8789` exposes the trainer UI and captured-audio endpoint. +- `-v $(pwd):/data` persists models, downloaded voices, datasets, samples, and firmware caches. -Then open: +Open: ```text -http://localhost:8888 +http://localhost:8789 ``` --- ## What The UI Does -- Start a wake word session -- Test TTS pronunciation -- Upload one or many personal samples -- Normalize uploads to `16 kHz / mono / 16-bit PCM WAV` -- Train with or without personal samples -- Show a popup console with live progress and logs - -Personal samples are optional. If none are uploaded, the trainer can still proceed with TTS-only data after confirmation. +- `Trainer` starts a wake-word session, shows positive/negative sample counts, and launches training. +- `Captured Audio` reviews clips sent by ESPHome sats, including wake hits, close misses, and false wakes. +- `Samples` plays, removes, clears, and manually imports personal or negative samples. +- `Firmware` builds the latest `microWakeWords` ESPHome YAMLs from GitHub and flashes VoicePE or Satellite1 over OTA. +- Popup consoles show colorized training and firmware logs while long-running jobs are active. --- -## Personal Samples +## Captured Audio Workflow -Accepted upload formats include: +To collect samples from a sat, flash it with the Tater firmware from [TaterTotterson/microWakeWords](https://github.com/TaterTotterson/microWakeWords). The `Firmware` tab can build and flash the VoicePE or Satellite1 YAMLs directly from that repo. + +After flashing, the device exposes ESPHome entities for capture setup: + +- `Capture Wake Audio` toggles upload of wake-word triggers. +- `Capture Close Misses` toggles upload of near misses. +- `Trainer App URL` sets the trainer address, for example `http://:8789`. + +ESPHome devices can send raw captured audio to: + +```text +/api/upload_captured_audio_raw +``` + +Keep the training app running and reachable at the `Trainer App URL` while capture is enabled. The sats upload clips live; if the app is stopped or the URL is wrong, captured audio will not be saved. + +In the `Captured Audio` tab: + +- play each clip from the inbox +- mark good wake-word clips as `This is good` +- mark bad triggers as `False wake` +- discard clips that should not be used + +Approved clips move into: + +```text +/data/personal_samples/ +``` + +False wakes move into: + +```text +/data/negative_samples/ +``` + +Captured audio is boosted for easier playback in the UI, then kept in the correct training format. + +--- + +## Samples + +The `Samples` tab is the sample library. + +- `Personal` samples are positive examples of the wake word. +- `Negative` samples are reviewed false wakes or hard negatives. +- Both can be played back and removed one at a time. +- Manual upload is available here as an optional seed path. + +Accepted manual upload formats include: - WAV - MP3 @@ -72,97 +112,120 @@ Accepted upload formats include: - OPUS - WEBM -The backend validates or converts uploads with `ffmpeg` and stores the normalized files in: +Uploads are validated or converted with `ffmpeg` into: ```text -/data/personal_samples/ +16 kHz / mono / 16-bit PCM WAV ``` -Notes: +Starting a new session does not clear samples. Use the clear buttons in `Samples` if you want to remove saved personal or negative clips. -- starting a new session does not clear personal samples -- use the `Clear personal samples` button if you want to wipe them -- any uploaded personal samples are automatically included in training +--- + +## Training Flow + +1. Enter the wake phrase in `Trainer`. +2. Choose the language. +3. Optionally test pronunciation with `Test TTS`. +4. Review the positive and negative sample counts. +5. Click `Start training`. +6. Watch the popup training console. + +Personal samples are optional. Training can run with zero personal samples after confirmation, using generated TTS samples and the stock negative datasets. + +Reviewed negative samples are converted into `/data/work/reviewed_negative_features/` and inserted into the training YAML as a hard-negative feature set when present. --- ## Language Support -The language selector is dynamic. +The language picker is dynamic. -- `en` is always available -- non-English languages are populated from Piper voice metadata -- when you train with a non-English language, the backend downloads all Piper ONNX voices for that selected language only -- it does not pre-download every language -- already-downloaded voices are reused on later runs +- `en` is always available. +- English keeps the existing dedicated generator model path. +- Non-English languages are discovered from the Piper voices catalog and any local Piper voice metadata. +- When a non-English language is selected, the trainer downloads all voices for that selected language only. +- Already-downloaded voices are reused. +- It does not download every language up front. -English stays on its existing dedicated generator model path. Non-English languages use the selected language's ONNX Piper voices. - -If the Piper catalog is unavailable, already-installed local voices can still be used. +If the upstream Piper catalog is unavailable, already-installed local voices are used when available. --- -## Training Behavior +## Dataset Behavior -1. Enter the wake word -2. Optionally test pronunciation -3. Optionally upload personal samples -4. Click `Start training` -5. Watch the popup console for: - - selected-language voice downloads when needed - - sample generation progress - - dataset setup - - training progress and completion - -The `Open console` button lets you reopen the log window after closing it. - ---- - -## First Run Notes - -The first real training run may download large training assets into `/data`, such as: +The first training run downloads and prepares missing training assets into `/data`, including: - Piper voices for the selected language -- training datasets and background data -- Python training environment dependencies +- negative datasets and background data +- the Python training environment +- generated samples and augmented feature caches -These are reused later unless you delete `/data`. +After those assets are prepared, later runs reuse the local copies unless the mounted `/data` contents are deleted. + +--- + +## Firmware Flashing + +The `Firmware` tab builds and flashes Tater firmware for supported ESPHome sats. + +- Downloads the latest firmware YAML templates from `TaterTotterson/microWakeWords` on GitHub. +- Lets you choose `VoicePE` or `Satellite1`. +- Auto-detects ESPHome devices with mDNS when available. +- Allows manual IP or hostname entry if discovery does not find the device. +- Saves firmware form values so you do not re-enter sounds and URLs every run. +- Lists locally trained wake words from `/data/trained_wake_words/` for easy model selection. +- Builds with ESPHome and flashes OTA. +- Streams ESPHome output in a colorized firmware console. + +Firmware YAMLs are intentionally pulled from GitHub each time. There is no local fallback path in the trainer UI. --- ## Output Files -Successful runs produce: +Successful runs produce timestamped training output folders such as: ```text -/data/output/.tflite -/data/output/.json +/data/output/---/.tflite +/data/output/---/.json ``` -If those files already exist, the trainer creates timestamped backups before replacing them. +The trainer also syncs firmware-ready artifacts into: + +```text +/data/trained_wake_words/.tflite +/data/trained_wake_words/.json +``` + +The firmware tab uses `/data/trained_wake_words/` to populate the wake-word dropdown. --- ## Resetting Everything -If you want a clean slate, stop the container and remove the contents of your mounted `/data` directory. +If you want a clean slate, stop the container and remove the contents of the mounted `/data` directory. -That will remove: +That removes: - personal samples +- negative samples +- captured inbox clips - downloaded Piper voices - cached datasets - training environments - trained models +- firmware build caches --- -## Notes +## Important Notes -- browser microphone recording has been removed -- personal samples are optional -- the server module is now `trainer_server.py` -- the launcher script is now `run.sh` +- Personal samples are optional. +- Negative samples are optional but useful for reducing false wakes. +- The UI server is `trainer_server.py`. +- The launcher is `run.sh`. +- Firmware capture settings live on the ESPHome device and can be toggled from the device entities after flashing. --- diff --git a/cli/wake_word_sample_augmenter b/cli/wake_word_sample_augmenter index 9e635af..83fb4c1 100644 --- a/cli/wake_word_sample_augmenter +++ b/cli/wake_word_sample_augmenter @@ -18,6 +18,8 @@ parser.add_argument("--output-dir", type=str, help="Wake word output dir. Defaul # Personal inputs/outputs (NEW) parser.add_argument("--personal-dir", type=str, help="Personal WAV dir. Default: /personal_samples", required=False) parser.add_argument("--personal-output-dir", type=str, help="Personal features output dir. Default: /work/personal_augmented_features", required=False) +parser.add_argument("--negative-dir", type=str, help="Reviewed negative WAV dir. Default: /negative_samples", required=False) +parser.add_argument("--negative-output-dir", type=str, help="Reviewed negative features output dir. Default: /work/reviewed_negative_features", required=False) # Dataset dirs parser.add_argument("--mit-rirs-16k-dir", type=str, help="MIT RIR input directory. Default: /training_datasets/mit_rirs_16k", required=False) @@ -57,6 +59,17 @@ if not args.personal_output_dir: else: args.personal_output_dir = os.path.realpath(args.personal_output_dir) +# Reviewed negative defaults +if not args.negative_dir: + args.negative_dir = os.path.join(args.data_dir, "negative_samples") +else: + args.negative_dir = os.path.realpath(args.negative_dir) + +if not args.negative_output_dir: + args.negative_output_dir = os.path.join(work_dir, "reviewed_negative_features") +else: + args.negative_output_dir = os.path.realpath(args.negative_output_dir) + # Dataset defaults if not args.mit_rirs_16k_dir: args.mit_rirs_16k_dir = os.path.join(args.data_dir, "training_datasets", "mit_rirs_16k") @@ -205,7 +218,7 @@ def bind_wav_generator(clips_obj: Clips, wav_dir: str): clips_obj.audio_generator = types.MethodType(audio_generator_from_wavs, clips_obj) -def generate_feature_set(input_wav_dir: str, out_root_dir: str, label: str): +def generate_feature_set(input_wav_dir: str, out_root_dir: str, label: str, *, remove_silence: bool = True): files = glob.glob(os.path.join(input_wav_dir, "*.wav")) if not files: print(f"ℹ️ No WAVs found for {label} in: {input_wav_dir} (skipping)") @@ -218,7 +231,7 @@ def generate_feature_set(input_wav_dir: str, out_root_dir: str, label: str): input_directory=input_wav_dir, file_pattern="*.wav", max_clip_duration_s=5, - remove_silence=True, + remove_silence=remove_silence, random_split_seed=10, split_count=0.1, ) @@ -263,9 +276,12 @@ def generate_feature_set(input_wav_dir: str, out_root_dir: str, label: str): # Wake word generated/TTS features (existing behavior) generate_feature_set(args.input_dir, args.output_dir, "generated") -# Personal features (NEW) +# Personal features generate_feature_set(args.personal_dir, args.personal_output_dir, "personal") +# Reviewed false-positive / hard-negative features +generate_feature_set(args.negative_dir, args.negative_output_dir, "reviewed negatives", remove_silence=False) + END_TIME = datetime.now(timezone.utc).replace(microsecond=0) et = END_TIME - START_TIME print(f"\n{'=' * 80}") diff --git a/cli/wake_word_sample_trainer b/cli/wake_word_sample_trainer index f4fcda2..7abf52d 100644 --- a/cli/wake_word_sample_trainer +++ b/cli/wake_word_sample_trainer @@ -111,6 +111,16 @@ else echo "ℹ️ No personal features found at ${PERSONAL_FEATURES_DIR}/training (continuing without personal weighting)" fi +# Reviewed false-positive features are optional hard negatives. +REVIEWED_NEGATIVE_FEATURES_DIR="${WORK_DIR}/reviewed_negative_features" +HAS_REVIEWED_NEGATIVE="false" +if [ -d "${REVIEWED_NEGATIVE_FEATURES_DIR}/training" ] ; then + HAS_REVIEWED_NEGATIVE="true" + echo "✅ Found reviewed negative features: ${REVIEWED_NEGATIVE_FEATURES_DIR}/training (will weight as hard negatives)" +else + echo "ℹ️ No reviewed negative features found at ${REVIEWED_NEGATIVE_FEATURES_DIR}/training (continuing with stock negatives)" +fi + cd "${WORK_DIR}" echo "===== Starting ${TRAINING_STEPS} training steps =====" @@ -133,6 +143,7 @@ features: truth: true type: mmap __PERSONAL_FEATURE_MARKER__ +__REVIEWED_NEGATIVE_FEATURE_MARKER__ - features_dir: __NEG_SPEECH__ penalty_weight: 1.0 sampling_weight: 12.0 @@ -208,6 +219,22 @@ else sed -i -e "/__PERSONAL_FEATURE_MARKER__/d" "${YAML_PATH}" fi +# Insert/remove reviewed hard-negative block +if [ "${HAS_REVIEWED_NEGATIVE}" = "true" ]; then + reviewed_negative_block="$(cat < ROOTDIR: ${ROOTDIR}" @@ -42,10 +43,27 @@ if [[ ! -f "${PIN_FILE}" ]]; then ${PIP} install \ "fastapi==${FASTAPI_VERSION}" \ "uvicorn[standard]==${UVICORN_VERSION}" \ - "python-multipart==${PY_MULTIPART_VERSION}" + "python-multipart==${PY_MULTIPART_VERSION}" \ + "esphome==${ESPHOME_VERSION}" touch "${PIN_FILE}" else echo "Reusing existing trainer UI venv (no upgrades)" + if ! "${PY}" - "${ESPHOME_VERSION}" <<'PY' >/dev/null 2>&1 +import importlib.metadata +import sys + +expected = sys.argv[1] +installed = importlib.metadata.version("esphome") +raise SystemExit(0 if installed == expected else 1) +PY + then + echo "Firmware tab dependencies missing or stale; installing ESPHome firmware dependencies" + ${PIP} install \ + "fastapi==${FASTAPI_VERSION}" \ + "uvicorn[standard]==${UVICORN_VERSION}" \ + "python-multipart==${PY_MULTIPART_VERSION}" \ + "esphome==${ESPHOME_VERSION}" + fi fi # ----------------------------- @@ -54,6 +72,9 @@ fi export DATA_DIR="${DATA_DIR}" export STATIC_DIR="${ROOTDIR}/static" export PERSONAL_DIR="${DATA_DIR}/personal_samples" +export CAPTURED_DIR="${DATA_DIR}/captured_audio" +export NEGATIVE_DIR="${DATA_DIR}/negative_samples" +export TRAINED_WAKE_WORDS_DIR="${DATA_DIR}/trained_wake_words" # IMPORTANT: leave training venv creation to /api/train inside trainer_server.py # but still set TRAIN_CMD so the server knows how to invoke training once ready diff --git a/static/index.html b/static/index.html index d0a668a..aa65071 100644 --- a/static/index.html +++ b/static/index.html @@ -67,6 +67,7 @@ input[type="text"], input[type="number"], + input[type="password"], select { padding: 11px 12px; font-size: 15px; @@ -77,6 +78,7 @@ outline: none; } input[type="text"] { width: 420px; max-width: 100%; } + input[type="password"] { width: 260px; max-width: 100%; } input[type="number"] { width: 132px; } input::placeholder { color: rgba(236,236,241,0.36); } @@ -191,6 +193,489 @@ font-size: 13px; } + .field { + display: grid; + gap: 6px; + color: var(--muted); + font-size: 13px; + } + + .field strong { + color: var(--text); + font-size: 14px; + } + + .field input, + .field select { + width: 100%; + } + + .firmwareGrid { + display: grid; + grid-template-columns: minmax(260px, 1fr) minmax(160px, 220px) minmax(220px, 280px); + gap: 12px; + align-items: end; + } + + .firmwareHero { + position: relative; + overflow: hidden; + min-height: 176px; + background: + radial-gradient(520px 220px at 88% 0%, rgba(255,192,127,0.16), transparent 62%), + linear-gradient(135deg, rgba(255,138,42,0.12), rgba(255,255,255,0.035) 44%, rgba(255,255,255,0.02)); + } + + .firmwareHero::after { + content: ""; + position: absolute; + inset: auto -80px -120px auto; + width: 280px; + height: 280px; + border-radius: 999px; + border: 1px solid rgba(255,192,127,0.12); + background: radial-gradient(circle, rgba(255,138,42,0.12), transparent 66%); + pointer-events: none; + } + + .firmwareHero > * { + position: relative; + z-index: 1; + } + + .firmwareKicker { + color: var(--orange2); + font-size: 12px; + font-weight: 700; + letter-spacing: 0.14em; + text-transform: uppercase; + margin-bottom: 8px; + } + + .firmwareHero h3 { + font-size: clamp(24px, 4vw, 34px); + margin-bottom: 8px; + } + + .firmwareHero p { + max-width: 680px; + margin-bottom: 0; + } + + .firmwareSteps { + display: flex; + flex-wrap: wrap; + gap: 8px; + margin-top: 16px; + } + + .firmwareStepChip { + display: inline-flex; + align-items: center; + gap: 8px; + padding: 8px 10px; + border-radius: 999px; + border: 1px solid rgba(255,255,255,0.1); + background: rgba(0,0,0,0.2); + color: var(--muted); + font-size: 12px; + } + + .firmwareStepChip b { + color: var(--orange2); + font-weight: 700; + } + + .firmwareLayout { + display: grid; + grid-template-columns: 1fr; + gap: 14px; + align-items: start; + } + + .firmwarePanel { + margin-top: 0; + } + + .firmwarePanelHeader { + display: flex; + justify-content: space-between; + align-items: flex-start; + gap: 14px; + margin-bottom: 2px; + } + + .firmwarePanelTitle { + display: flex; + gap: 10px; + align-items: flex-start; + } + + .firmwareStepBadge { + flex: 0 0 auto; + display: grid; + place-items: center; + width: 30px; + height: 30px; + border-radius: 11px; + border: 1px solid rgba(255,138,42,0.36); + background: rgba(255,138,42,0.12); + color: var(--orange2); + font-weight: 800; + font-size: 13px; + } + + .firmwarePanel h3 { + margin-bottom: 4px; + } + + .firmwarePanel p { + margin-bottom: 0; + font-size: 13px; + } + + .firmwareTargetGrid { + display: grid; + grid-template-columns: minmax(220px, 1fr) minmax(220px, 1fr) minmax(120px, 160px); + gap: 12px; + align-items: end; + } + + .firmwareFields { + margin-top: 4px; + } + + .firmwareSettingsSection { + display: grid; + gap: 14px; + padding: 16px; + border-radius: 16px; + border: 1px solid rgba(255,255,255,0.08); + background: + linear-gradient(180deg, rgba(255,255,255,0.045), rgba(255,255,255,0.022)), + rgba(255,255,255,0.02); + } + + .firmwareSettingsGrid { + display: grid; + grid-template-columns: repeat(auto-fit, minmax(220px, 1fr)); + gap: 12px; + align-items: end; + } + + .readOnlyValue { + min-height: 43px; + display: flex; + align-items: center; + justify-content: space-between; + gap: 10px; + padding: 11px 12px; + border-radius: 12px; + border: 1px solid rgba(255,192,127,0.18); + background: + linear-gradient(180deg, rgba(255,138,42,0.08), rgba(255,255,255,0.025)), + rgba(0,0,0,0.24); + color: var(--text); + font-size: 15px; + } + + .readOnlyValue::after { + content: "Locked"; + flex: 0 0 auto; + padding: 3px 8px; + border-radius: 999px; + border: 1px solid rgba(255,192,127,0.22); + color: var(--orange2); + background: rgba(255,138,42,0.08); + font-size: 11px; + font-weight: 700; + letter-spacing: 0.04em; + text-transform: uppercase; + } + + .firmwareActionsPanel { + display: grid; + grid-template-columns: minmax(260px, 1fr) auto; + gap: 16px; + align-items: center; + margin-top: 0; + border-color: rgba(255,138,42,0.16); + background: + linear-gradient(135deg, rgba(255,138,42,0.1), rgba(255,255,255,0.035)), + var(--panel2); + } + + .firmwareActions { + justify-content: flex-end; + } + + .studioHero { + position: relative; + overflow: hidden; + min-height: 176px; + background: + radial-gradient(520px 220px at 88% 0%, rgba(255,192,127,0.16), transparent 62%), + linear-gradient(135deg, rgba(255,138,42,0.12), rgba(255,255,255,0.035) 44%, rgba(255,255,255,0.02)); + } + + .studioHero.trainerHero { + background: + radial-gradient(560px 240px at 88% 0%, rgba(255,192,127,0.15), transparent 62%), + radial-gradient(360px 220px at 4% 92%, rgba(57,212,160,0.08), transparent 64%), + linear-gradient(135deg, rgba(255,138,42,0.12), rgba(255,255,255,0.035) 44%, rgba(255,255,255,0.02)); + } + + .studioHero.captureHero { + background: + radial-gradient(540px 220px at 84% 0%, rgba(137,212,255,0.12), transparent 62%), + radial-gradient(420px 260px at 0% 88%, rgba(255,138,42,0.1), transparent 66%), + linear-gradient(135deg, rgba(255,255,255,0.055), rgba(255,138,42,0.06) 46%, rgba(255,255,255,0.02)); + } + + .studioHero::after { + content: ""; + position: absolute; + inset: auto -80px -120px auto; + width: 280px; + height: 280px; + border-radius: 999px; + border: 1px solid rgba(255,192,127,0.12); + background: radial-gradient(circle, rgba(255,138,42,0.12), transparent 66%); + pointer-events: none; + } + + .studioHero > * { + position: relative; + z-index: 1; + } + + .studioKicker { + color: var(--orange2); + font-size: 12px; + font-weight: 700; + letter-spacing: 0.14em; + text-transform: uppercase; + margin-bottom: 8px; + } + + .studioHero h3 { + font-size: clamp(24px, 4vw, 34px); + margin-bottom: 8px; + } + + .studioHero p { + max-width: 700px; + margin-bottom: 0; + } + + .studioSteps { + display: flex; + flex-wrap: wrap; + gap: 8px; + margin-top: 16px; + } + + .studioStepChip { + display: inline-flex; + align-items: center; + gap: 8px; + padding: 8px 10px; + border-radius: 999px; + border: 1px solid rgba(255,255,255,0.1); + background: rgba(0,0,0,0.2); + color: var(--muted); + font-size: 12px; + } + + .studioStepChip b { + color: var(--orange2); + font-weight: 700; + } + + .studioPanel { + margin-top: 0; + } + + .studioPanelHeader { + display: flex; + justify-content: space-between; + align-items: flex-start; + gap: 14px; + } + + .studioPanelTitle { + display: flex; + gap: 10px; + align-items: flex-start; + } + + .studioStepBadge { + flex: 0 0 auto; + display: grid; + place-items: center; + width: 30px; + height: 30px; + border-radius: 11px; + border: 1px solid rgba(255,138,42,0.36); + background: rgba(255,138,42,0.12); + color: var(--orange2); + font-weight: 800; + font-size: 13px; + } + + .studioPanel h3 { + margin-bottom: 4px; + } + + .studioPanel p, + .studioActionsPanel p { + margin-bottom: 0; + font-size: 13px; + } + + .phraseGrid { + display: grid; + grid-template-columns: minmax(260px, 1fr) minmax(180px, 240px) auto; + gap: 12px; + align-items: end; + } + + .phraseActions { + display: flex; + flex-wrap: wrap; + gap: 10px; + align-items: center; + } + + .sampleProgressCard { + padding: 14px; + border-radius: 16px; + border: 1px solid rgba(255,255,255,0.08); + background: + linear-gradient(180deg, rgba(255,255,255,0.04), rgba(255,255,255,0.02)), + rgba(255,255,255,0.02); + } + + .studioActionsPanel { + display: grid; + gap: 16px; + margin-top: 0; + border-color: rgba(255,138,42,0.16); + background: + linear-gradient(135deg, rgba(255,138,42,0.1), rgba(255,255,255,0.035)), + var(--panel2); + } + + .capturedControlPanel { + display: grid; + grid-template-columns: minmax(260px, 1fr) auto; + gap: 16px; + align-items: center; + } + + .capturedActions { + justify-content: flex-end; + } + + .sampleLibraryHeader { + display: grid; + grid-template-columns: minmax(260px, 1fr) auto; + gap: 16px; + align-items: center; + } + + .sampleTypeTabs { + display: inline-flex; + flex-wrap: wrap; + gap: 8px; + padding: 6px; + border-radius: 999px; + border: 1px solid rgba(255,255,255,0.08); + background: rgba(0,0,0,0.18); + } + + .sampleTypeBtn { + min-width: 124px; + padding: 8px 12px; + border-radius: 999px; + font-size: 13px; + background: transparent; + } + + .sampleTypeBtn.active { + border-color: rgba(255,138,42,0.42); + background: rgba(255,138,42,0.16); + color: var(--orange2); + } + + .tabs { + display: flex; + flex-wrap: wrap; + gap: 10px; + margin: 4px 0 0; + } + + .tabBtn { + min-width: 140px; + border-radius: 999px; + background: rgba(255,255,255,0.04); + } + + .tabBtn.active { + border-color: rgba(255,138,42,0.42); + background: linear-gradient(180deg, rgba(255,138,42,0.2), rgba(255,138,42,0.08)); + color: var(--orange2); + } + + .viewStack[hidden] { + display: none !important; + } + + .capturedList { + display: grid; + gap: 12px; + } + + .captureCard { + display: grid; + gap: 12px; + padding: 14px; + border-radius: 16px; + border: 1px solid rgba(255,255,255,0.08); + background: rgba(255,255,255,0.03); + } + + .captureTitle { + margin: 0; + font-size: 16px; + font-weight: 700; + } + + .captureSubtitle { + margin: 4px 0 0; + color: var(--muted); + font-size: 13px; + } + + .audioPlayer { + width: 100%; + border-radius: 12px; + } + + .captureActions { + display: flex; + flex-wrap: wrap; + gap: 10px; + } + + .emptyState { + padding: 18px; + border-radius: 16px; + border: 1px dashed rgba(255,255,255,0.12); + background: rgba(255,255,255,0.03); + color: var(--muted); + } + .consoleOverlay { position: fixed; inset: 0; @@ -201,13 +686,15 @@ background: rgba(4, 5, 10, 0.5); backdrop-filter: blur(10px); opacity: 0; + visibility: hidden; pointer-events: none; - transition: opacity 0.18s ease; - z-index: 50; + transition: opacity 0.18s ease, visibility 0.18s ease; + z-index: 10000; } .consoleOverlay.open { opacity: 1; + visibility: visible; pointer-events: auto; } @@ -292,6 +779,157 @@ margin-top: 2px; } + .firmwareLogModal { + position: fixed; + inset: 0; + display: flex; + align-items: center; + justify-content: center; + padding: 20px; + background: rgba(0, 0, 0, 0.64); + opacity: 0; + visibility: hidden; + pointer-events: none; + z-index: 12000; + } + + .firmwareLogModal.active { + visibility: visible; + pointer-events: auto; + animation: firmwareLogBackdropIn 280ms ease-out both; + } + + .firmwareLogDialog { + width: min(1320px, 98vw); + height: min(94vh, 1040px); + display: flex; + flex-direction: column; + gap: 12px; + padding: 18px; + border-radius: 22px; + border: 1px solid rgba(255,255,255,0.12); + background: + linear-gradient(180deg, rgba(17, 20, 28, 0.88), rgba(8, 10, 16, 0.96)), + rgba(8, 10, 16, 0.88); + box-shadow: 0 28px 84px rgba(0,0,0,0.58); + backdrop-filter: blur(18px) saturate(1.12); + overflow: hidden; + opacity: 0; + transform: translateY(14px) scale(0.97); + } + + .firmwareLogModal.active .firmwareLogDialog { + animation: firmwareLogDialogIn 300ms cubic-bezier(0.2, 0.8, 0.2, 1) both; + } + + .firmwareLogHeader { + display: flex; + justify-content: space-between; + align-items: flex-start; + gap: 16px; + } + + .firmwareLogTitle { + margin: 0; + font-size: 18px; + } + + .firmwareLogMeta { + margin: 6px 0 0; + color: var(--muted); + font-size: 13px; + } + + .firmwareLogStatus { + color: #b7c7e6; + font-size: 13px; + } + + .firmwareLogConsole { + flex: 1 1 auto; + min-height: 0; + overflow: auto; + border: 1px solid rgba(255,138,42,0.22); + border-radius: 16px; + padding: 12px 14px; + background: + linear-gradient(180deg, rgba(15, 18, 25, 0.98), rgba(10, 12, 18, 0.98)), + radial-gradient(circle at top, rgba(255,138,42,0.08), transparent 40%); + box-shadow: + inset 0 1px 0 rgba(255,255,255,0.05), + inset 0 -1px 0 rgba(0,0,0,0.35); + font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, monospace; + font-size: 13px; + line-height: 1.45; + color: #dbe8ff; + } + + .firmwareLogLine { + display: grid; + grid-template-columns: auto minmax(0, 1fr); + gap: 10px; + align-items: start; + padding: 2px 0; + } + + .firmwareLogLevel { + display: inline-flex; + align-items: center; + justify-content: center; + min-width: 58px; + border-radius: 999px; + padding: 2px 8px; + font-size: 11px; + font-weight: 800; + letter-spacing: 0.04em; + border: 1px solid rgba(255,255,255,0.12); + white-space: nowrap; + } + + .firmwareLogMessage { + white-space: pre-wrap; + word-break: break-word; + } + + .firmwareLogLine.tone-info .firmwareLogLevel { + background: rgba(70, 120, 255, 0.16); + border-color: rgba(108, 152, 255, 0.28); + color: #bdd3ff; + } + + .firmwareLogLine.tone-warn .firmwareLogLevel { + background: rgba(245, 167, 36, 0.14); + border-color: rgba(245, 167, 36, 0.32); + color: #ffd696; + } + + .firmwareLogLine.tone-error .firmwareLogLevel { + background: rgba(226, 76, 76, 0.15); + border-color: rgba(255, 111, 111, 0.34); + color: #ffc1c1; + } + + .firmwareLogLine.tone-debug .firmwareLogLevel { + background: rgba(92, 214, 178, 0.14); + border-color: rgba(92, 214, 178, 0.28); + color: #bffff0; + } + + .firmwareLogEmpty { + color: var(--muted); + } + + @keyframes firmwareLogBackdropIn { + from { opacity: 0; } + to { opacity: 1; } + } + + @keyframes firmwareLogDialogIn { + from { opacity: 0; transform: translateY(14px) scale(0.97); } + 62% { transform: translateY(-3px) scale(1.01); } + to { opacity: 1; transform: translateY(0) scale(1); } + } + .consoleMuted { color: #93a0b5; } .consoleText { color: #d8e2f1; } .consoleCmd { color: #89d4ff; } @@ -312,12 +950,63 @@ .wrap { padding: 18px 14px 30px; } input[type="text"] { width: 100%; } .fileItem { align-items: flex-start; flex-direction: column; } + .firmwareGrid { grid-template-columns: 1fr; } + .firmwareLayout, + .firmwareTargetGrid, + .firmwareActionsPanel { + grid-template-columns: 1fr; + } + .firmwareActions { + justify-content: stretch; + } + .firmwareActions button { + width: 100%; + } + .studioPanelHeader, + .capturedControlPanel { + grid-template-columns: 1fr; + } + .studioPanelHeader, + .capturedControlPanel { + flex-direction: column; + align-items: stretch; + } + .phraseGrid { + grid-template-columns: 1fr; + } + .phraseActions, + .capturedActions { + justify-content: stretch; + } + .phraseActions button, + .capturedActions button { + width: 100%; + } + .sampleLibraryHeader { + grid-template-columns: 1fr; + } .consoleOverlay { padding: 12px; } .consoleWindow { width: 100%; height: min(82vh, 760px); padding: 14px; } + .firmwareLogModal { + padding: 12px; + } + .firmwareLogDialog { + width: 100%; + height: 88vh; + padding: 14px; + } + .firmwareLogHeader { + flex-direction: column; + align-items: stretch; + } + .firmwareLogLine { + grid-template-columns: 1fr; + gap: 4px; + } .consoleHeader { flex-direction: column; align-items: stretch; @@ -340,80 +1029,342 @@
-

microWakeWord Personal Samples

-

Start a session, upload your own recorded voice samples, and the app will validate or convert them into the training format used by the existing pipeline.

+

microWakeWord Trainer Studio

+

Train wake words, review captured clips, and flash ESPHome firmware from one local workspace.

-
-
- - - - No session -
- -
- -
+
+ + + +
-
-
-
-

Optional Personal Samples

-

Personal samples are optional. You can train with TTS only, or upload your own audio here and it will be saved into personal_samples/ as 16 kHz mono 16-bit PCM WAV.

-
- Idle -
- -
-
- Select one or many files -

WAV, MP3, M4A, FLAC, OGG, AAC, OPUS, and WEBM are all fine when ffmpeg is available. Files already in the correct format are kept as-is.

-
- - No files selected -
- -
- - -
- -
+
+
- No upload in progress - 0% +
+
Training Studio
+

Build a Personal Wake Word

+

Start with a phrase, review your positive and negative sample counts, then launch the training pipeline with a live console so every step is visible.

+
+ 1 Phrase + voice + 2 Review sample counts + 3 Train model +
+
-
-
-
-
When you upload, each file is checked and converted only if needed before it is written into personal_samples/.
-
-
- Uploaded - 0 +
+
+
+ 1 +
+

Phrase + Voice

+

Name the wake phrase and choose the language/voice set used to generate training audio.

+
+
+ No session
-
- Training Format - 16 kHz / mono / 16-bit WAV + +
+ + +
+ + +
+
+
+ +
+
+
+ 2 +
+

Train Wake Word

+

Device-captured positives and reviewed negatives are used when present. Manual samples are managed from the Samples tab.

+
+
+ Not started +
+
+
+ Positive Samples + 0 +
+
+ Negative Samples + 0 +
+
+ Training Format + 16 kHz / mono / 16-bit WAV +
+
+
+ +
+
+ The training console opens automatically when a run starts. + +
+
+
+ +
+ +