DDX Omni IntelligenceCUDA

Loading...

DDX Omni IntelligenceCUDA

← Back to Help

Pitfalls

Failure modes by symptom: WS close codes, normalizer errors, idle timeouts, NeMo races, tail truncation.

Dudoxx Omni — Common Pitfalls (All Services)

One-page failure-mode catalog spanning STT, TTS, LLM, the wire envelope, and operations. Each entry has a symptom, root cause, and fix.

STT pitfalls

Symptom	Cause	Fix
HTTP 403 on WS handshake	missing `Authorization: Token …` and no `Sec-WebSocket-Protocol`	Add either; both empty → server pre-accept close
Close 1003 `unsupported-data`	encoding the normalizer can't decode	Use `linear16` 16 kHz mono OR add encoding to `audio_normalize.py`
Close 1008 `channels-too-many` / `sample-rate-too-high`	exceeds `DDX_STT_AUDIO_MAX_*` caps	Resample / downmix client-side, or raise env caps
Close 1011 `NET-0001`	10s without audio AND without KeepAlive	Send `{"type":"KeepAlive"}` every 5s
`Cannot unfreeze partially…` server warnings	NeMo `model.transcribe()` race; cosmetic	No client action; finalize-only `_final_lock` keeps finals correct
Long broadcasts emit only partials	No natural utterance gap → VAD never fires `speech_end`	Send `{"type":"Finalize"}` on a 30s timer
Concurrent N=3 sessions — two clients see 100% WER	Per-session VAD missing → shared `SileroVadDetector`	Already fixed: `vad_factory` returns a fresh detector per session
First minute of audio mis-detected as wrong language	Auto-detect window too short (<10s)	Already fixed: 20s sliding window for stable detection
"Mm" / "Uh" hallucinations on quiet mic	RMS gate disabled	Already fixed: silence-skip RMS gate on trailing 1s of window
Browser MediaRecorder webm-opus chunks rejected	Server treats opus as bytestream not full-frame	Send full webm chunks (`mediaRecorder.start(250)` 250ms tick); server normalizer accepts full-frame webm-opus
Italian transcription unreliable	No Italian primer + RAI broadcast 403	Upstream issue; add IT TTS clip to `app/services/lang_primers/`

TTS pitfalls

Symptom	Cause	Fix
HTTP 401	missing `X-API-Key` / `?api_key`	Add the header / query param
HTTP 422 unsupported `language`	not in `SpeakRequest._LANGUAGE_PATTERN`	Use BCP-47 short tag (`en`, `fr`, `de`, `it`)
HTTP 422 unsupported `sample_rate`	not in `{16000, 22050, 24000, 48000}`	Pick a supported rate
Last word missing in CUDA TTS	client passed `subtalker_dosample=true` override	Remove override or set `false`; server default is `false`
WS closes 1011 mid-utterance	engine exception (Qwen3 / Kokoro init)	Check `logs/tts.log`; `./ddx-manage.sh restart --prod tts`
SSE frames buffer at NGINX	proxy-buffering on	Add `proxy_buffering off;` and `proxy_set_header X-Accel-Buffering no;`
`audio_b64` payload too large	one-shot response holding 10s+ utterance	Switch to `/v1/speak/sse` for long text
Edits to dashboard.html don't show up	FastAPI caches the file at startup	`./ddx-manage.sh restart --prod tts` then user does Cmd-Shift-R
Port-already-in-use after restart	Old process freed port during teardown	Just `./ddx-manage.sh start --prod tts` again

LLM pitfalls

Symptom	Cause	Fix
HTTP 404 `model_not_found` on chat	requested `model` not in registry	`GET /v1/models` first, pick an `id`
HTTP 404 + `detail="server not in real-model mode"` on `/v1/models/load`	`DDX_LLM_USE_REAL_MODEL=0`	Set env var, restart `./ddx-manage.sh restart --prod llm`
HTTP 422 `context_too_long`	total tokens exceed model's `context_window`	Truncate history or pick a larger-context model
HTTP 503 `model_unavailable`	model file missing / load failed	Check `model_error` in `/health`; verify `models/cache/`
Stream stalls at first chunk	upstream still loading the model	Poll `/v1/models/current.loaded === true` before streaming
CORS error in browser	origin not in `DDX_LLM_CORS_ORIGINS`	Add origin or proxy through Next.js (preferred)
Mock mode never returns text	mock always returns `tool_calls`	Switch to real mode (`DDX_LLM_USE_REAL_MODEL=1`) or handle `tool_calls` in client
`/v1/embeddings` returns 404	dudoxx layer doesn't expose embeddings	Use a separate embeddings service

Wire-envelope pitfalls

Symptom	Cause	Fix
Client breaks on a new field	strict schema parser rejects unknown keys	Envelope is additive — clients must ignore unknown keys
Mixing v1 and v2 paths in one session	breaking changes ship at `/v2/*`	Pick one path per session; v1 stays immutable
Hand-edited generated bindings drift	`ddx-mlx-envelopes` regenerated, custom edits lost	Never edit `dist/`; change schema + `make gen`
Visemes missing in TTS Audio frames	`emit_visemes: false` (default) or backend doesn't ship visemes	Pass `emit_visemes: true`; only CUDA backend emits PB-15 visemes
Word timestamps off by primer length	language primer prepended	Already fixed: server shifts word timestamps user-relative

Operations / `ddx-manage.sh` pitfalls

Symptom	Cause	Fix
`ddx-manage.sh start` hangs the agent	foreground watch-mode	Always use `--prod` for service work; `--dev` is for human terminals
`restart` reports "did not become healthy" but `lsof -i :PORT` empty	port freed during teardown	Re-run `start --prod <svc>`
Service silently using mock model	`DDX_LLM_USE_REAL_MODEL` not exported	`DDX_LLM_USE_REAL_MODEL=1 ./ddx-manage.sh restart --prod llm`
RSS climbing under load	Uvicorn workers too low	`DDX_UVICORN_WORKERS=4 ./ddx-manage.sh restart --prod tts` (~3.8GB/worker, RTF 2.0–2.7×)
Logs noisy with `Cannot unfreeze partially…` under load	NeMo internal race; recovered by 3-attempt retry	Cosmetic; suppression via `try_acquire(timeout=0)` is a planned tweak

Browser / Next.js pitfalls

Symptom	Cause	Fix
`ws://` rejected from `https://` page	mixed content	Open `wss://` to your edge; bridge to `ws://` upstream server-side
API key visible in browser dev tools	client called TTS / LLM directly	Always proxy via Next.js Route Handler; keep keys in `process.env` (no `NEXT_PUBLIC_`)
Server Action used for read-only data	misuse of Server Actions	Use Server Components + service functions for reads; Server Actions are for writes only
Hardcoded UI string in TSX	bypassed next-intl	Use `t('key')`; add to `src/locales/{en,de,fr}/<ns>.json`
`await params` missed in page.tsx	Next.js 16 made params async	`const { locale } = await params;`
Custom WS server doesn't pass cookies upstream	`next` route handlers don't see WS upgrades	Use a tiny `server.mjs` wrapping `next` + `ws` (see STT skill)

NestJS pitfalls

Symptom	Cause	Fix
`WebSocketGateway` doesn't accept connections	default adapter is socket.io	`app.useWebSocketAdapter(new WsAdapter(app))` from `@nestjs/platform-ws`
Streaming controller never flushes	response held in memory	Set `'X-Accel-Buffering': 'no'`, `Connection: keep-alive`, write chunks directly
Guard order wrong (auth fires after roles)	Nest order: global → controller → handler	Stack as `ApiTokenGuard → JwtAuthGuard → RolesGuard` (cardinal NestJS rule)
`ConfigService.get('TTS_API_KEY')` returns undefined	env not loaded or typo	Use `getOrThrow('TTS_API_KEY')` and load `.env` via `ConfigModule.forRoot({ isGlobal: true })`

Python pitfalls

Symptom	Cause	Fix
`websockets.connect` rejects custom headers	older `websockets` API	Pin `websockets>=13`, use `additional_headers=[(...)]` (not `extra_headers`)
WS `max_size` too small for long streams	default 1 MiB	`max_size=2**24` (16 MiB) for STT, `max_size=None` for TTS audio frames
Client receives partial JSON	reading `bytes` instead of `str`	Filter `if not isinstance(msg, str): continue` before `json.loads`
Async cancellation leaks tasks	no timeout on consumer	Wrap in `asyncio.wait_for(task, timeout=N)` and cancel on TimeoutError

Reference

STT: ddx-cuda-live-stt/STT_API_USAGE.md, STT_FULL_CAPABILITIES.md
TTS: ddx-cuda-live-tts/TTS_API_USAGE.md, TTS_FULL_CAPABILITIES.md
LLM: ddx-mlx-llm/LLM_API_USAGE.md, LLM_API_ENDPOINTS.md
Envelope: ddx-prd-specs/envelopes/README.md
Service control: ./ddx-manage.sh status|start|stop|restart|logs <svc> (svc ∈ stt, tts, usage, llm, web, all)