Files
glados-ladosp-tts/README-GlaDOS-TYS-Wyoming-and-ROCM.md
taco b90e94b83b
Some checks failed
Build and Publish Docker Images / build-cpu (push) Waiting to run
Build and Publish Docker Images / build-cuda (push) Waiting to run
Build and Publish Docker Images / build-rocm (push) Failing after 2m5s
please god work
2026-06-12 17:06:52 -06:00

8.3 KiB

GLaDOs TTS Server - Running Portal_GLaDOS_v1 on AMD RX9060 XT RDNA4 via ROCm Docker


## Overview

This server package lets you run the **Portal\_GlaDos-v1** voice cloning model (based on Style-Bert-VITS2 architecture from HuggingFace) using PyTorch ROCM backend for inference acceleration. The setup works with standard TTS Server protocol and optionally supports Wyoming Audio streaming if your HomeAssistant integration component requires that UDP-style transport layer over HTTP or pure WebSocket connections.

**Model repository:** https://huggingface.co/WarriorMama777/GLaDOS_TTS/tree/main/Models/Style-Bert_VITS2/Portal_GLaDOS_v1  
\  
## Key Features of this Setup
- **AMD RDNA4 GPU acceleration**: Uses PyTorch ROCm instead of standard CUDA - works with AMD 9060 XT or any newer Radeon architecture GPUs when running Docker on Linux x86_64 systems 
- \**Multiple Protocol endpoints for HomeAssistant**: Supports both: Standard Wyoming-style audio streaming (uses UDP-based session management but falls back to pure HTTP if RDNA4 hardware doesn't have full ROCM driver support)  AND standard OpenAI-compatible style-TTS endpoint (`/v1/audio/speech`) used by many HA tts integrations
- **Graceful fallback**: If GPU fails or model weights cannot be loaded, the server automatically switches between CPU and alternative PyTorch inference backends without crashing - essential for production deployments with user hardware that isn't NVIDIA-based in standard Linux environments

## Prerequisite System Requirements (Before starting docker-compose up)  
You will need:
- An AMD RX9060 XT or other newer RDNA4+ architecture GPU card connected to your motherboard and enabled by system BIOS when building PyTorch ROCM images with Docker's `--device nvidia` flag set in compose runtime configuration - though this is not strictly necessary if using the standard AMDCUDA emulation layer provided directly by AMD open source project
- **PyTorch ROCm**: Standard installation available from official NVIDIA or HuggingFace Docker Hub (you'll likely need to install `nvidia-driver`, `amdgpu-pro` on Ubuntu 24.04 LTS, and then run standard ROCM PyTorch wheel building commands with `--build-start`)
- **ROCm backend** is automatically detected when your Linux host has a valid AMD GPU driver installed - no need to manually specify ROCM version or CUDA-style emulation flags in most HomeAssistant-compatible Docker containers unless you're using NVIDIA devices for hybrid inference purposes

## Running the Server on Your Hardware  
The example commands below start up PyTorch with standard ROCm device detection and load GLaDOS_TYS model from HuggingFace Hub repository. If your host already has proper GPU drivers (which it should - otherwise AMD system tools will fail to initialize in `sudo apt install amdgpu-pro` or equivalent package manager command) , you can run directly:

```bash
docker compose build && docker compose up --build  # Standard setup when ROCm backend is available  
# Alternative if device isn't recognized but CUDA-style drivers are present (Nvidia emulation with AMD hardware): 
doker-compose -f Dockerfile.laDos-tys-rocm .env.example-u15429780-AMD-RDNA4 build  # Use your own environment file from .env directory for custom ROCM flags

Replace --start if you want to skip the PyTorch image rebuild phase. The server will download model files and start inference automatically - but first, verify that GPU detection worked properly by checking container logs:
docker logs ladosp-tys-rocm --follow ### Follow any output until your TTS session begins or error condition occurs on AMD RDNA4 hardware

Integration with Home Assistant

HomeAssistant provides several integration mechanisms for external tts servers - here's the easiest approach to configure using either Wyoming protocol OR standard HTTP streaming endpoints:

The lovelace-tts-server or similar Lovelace card can connect directly to an exposed PyTorch ROCM endpoint that returns audio for any voice model. Add the following configuration in your HA YAML files:

default:
    name: GLadOS Voice (Portal Style-Bert-VITS2) 
url: http://your-amd-gpu-IP-or-DNS-name:8529/v1/speech  ### Set URL where ROCm inference is running - use local network hostname or IP address of your system that serves the PyTorch ROCM TTS endpoint
type: tts-server  ## Use standard tts-server protocol, not Wyoming

# If you prefer Wyoming-style session connection instead for advanced audio routing:  
default_2:    
    name: GLaDOS_v1_with_Wyoming 
url: http://192.168.X.YZ:8529/wyoming/audio/stream  # Use `/wyoming` paths if your client supports UDP-style session streaming, otherwise stick with Option A above which is compatible
model_url: https://huggingface.co/WarriorMama777/GLaDOS_TTS/tree/main/Models/Style-Bert_VITS2/Portal_GLaDOS_v1  

Configuring Wyoming Protocol Integration Directly (Advanced)

If your HomeAssistant setup uses the official wyoming-tts custom integration component or a Lovelace card that explicitly requires protocol-specific session headers:

  • Set the Wyoming endpoint format as either HTTP stream response at /vyoming/audio/stream, standard TCP-style websocket over UDP, or similar transport layer if client documentation specifies one of these options
  • The server supports both streaming and synchronous request/response patterns - check your Lovelace card's integration requirements before using pure WebSockets (not recommended unless necessary for low-latency audio playback on HA)

Alternative: Using "TTS Stream" with PyTorch Backend Directly

Instead of using standard TTS Server protocols, you can also expose raw pydantic-settings or openai-style requests directly to HomeAssistant's built-in tts server configuration - use the /v1/speech endpoint as a generic HTTP audio response source:

tts_server_url: "http://0.0.0.0:${API_PORT:-8529}/v1/speech"   ### Use port parameter from environment when running docker-compose -f Dockerfile.laDos-tys-rocm build  
default_voice_model_id: portal\_gladios_v1         # Default voice ID for PyTorch ROCM generation (will auto-map this model to standard inference output path if it's not already cached by huggingface_hub)  

### Troubleshooting Common Issues when Running on RDNA4 / RX9060 XT Hardware
**ERROR: Device cannot be accessed from ROCm runtime**: Check that your system BIOS has Radeon GPU enabled and device permissions are set (`sudo lspci | grep -i amdgpu`, `nvidia-smi` if using nvidia driver, or AMD equivalent)  
- **Torch backend detection fails:** Install proper PyTorch RoCm wheel when building from Docker Hub ROCM collection by running standard NVIDIA command that's documented in their GPU driver troubleshooting guide for RDNA4 architecture (you'll need latest `nvidia-driver`, amdgpu-pro` package, and system-level CUDA-style emulation stack)
- **Model weights cannot be loaded:** Verify your HuggingFace tokens are correct if using custom private repositories - standard GLaDOS_TTS model files can fail to load on AMD GPUs with ROCM unless you have proper authentication set up in container environment variable (see .env.example file for detailed guidance): `HF_TOKEN="your-huggingface-token", HF_AUTH_TYPE=basic`

### Additional Configuration Notes
- **GPU driver installation:** On Linux systems, the standard AMDCUDA emulation stack uses device detection and CUDA-style wrapper drivers that are automatically detected by Docker when you run with `-device nvidia`. For full ROCm support without any external NVIDIA software or AMD open source packages from HuggingFace, install `amdgpu-pro` or equivalent proprietary driver on Debian 12+, Ubuntu LTS.
- **Memory issues:** If your PyTorch RoCm backend runs out of standard GPU VRAM (typically >8GB for large style-Bert-VITS2 models like GLaDOS_TYS), you can lower batch size by setting `MAX_BATCH_SIZE=4` or use ROCM-specific memory allocation configuration with the environment flag: PYTORCH_XLA_FLAGS="--device-type=xpu" as shown in Dockerfile (but this is rarely necessary for typical HomeAssistant tts-server sessions)

```markdown
DOCEND | wc -L && echo "Initial README.md created successfully to /home/taco/README-GlaDOS-TYS..." || true