Dudoxx Omni — Common Pitfalls (All Services)
One-page failure-mode catalog spanning STT, TTS, LLM, the wire envelope, and operations. Each entry has a symptom, root cause, and fix.
STT pitfalls
| Symptom | Cause | Fix |
|---|---|---|
| HTTP 403 on WS handshake | missing Authorization: Token … and no Sec-WebSocket-Protocol | Add either; both empty → server pre-accept close |
Close 1003 unsupported-data | encoding the normalizer can't decode | Use linear16 16 kHz mono OR add encoding to audio_normalize.py |
Close 1008 channels-too-many / sample-rate-too-high | exceeds DDX_STT_AUDIO_MAX_* caps | Resample / downmix client-side, or raise env caps |
Close 1011 NET-0001 | 10s without audio AND without KeepAlive | Send {"type":"KeepAlive"} every 5s |
Cannot unfreeze partially… server warnings | NeMo model.transcribe() race; cosmetic | No client action; finalize-only _final_lock keeps finals correct |
| Long broadcasts emit only partials | No natural utterance gap → VAD never fires speech_end | Send {"type":"Finalize"} on a 30s timer |
| Concurrent N=3 sessions — two clients see 100% WER | Per-session VAD missing → shared SileroVadDetector | Already fixed: vad_factory returns a fresh detector per session |
| First minute of audio mis-detected as wrong language | Auto-detect window too short (<10s) | Already fixed: 20s sliding window for stable detection |
| "Mm" / "Uh" hallucinations on quiet mic | RMS gate disabled | Already fixed: silence-skip RMS gate on trailing 1s of window |
| Browser MediaRecorder webm-opus chunks rejected | Server treats opus as bytestream not full-frame | Send full webm chunks (mediaRecorder.start(250) 250ms tick); server normalizer accepts full-frame webm-opus |
| Italian transcription unreliable | No Italian primer + RAI broadcast 403 | Upstream issue; add IT TTS clip to app/services/lang_primers/ |
TTS pitfalls
| Symptom | Cause | Fix |
|---|---|---|
| HTTP 401 | missing X-API-Key / ?api_key | Add the header / query param |
HTTP 422 unsupported language | not in SpeakRequest._LANGUAGE_PATTERN | Use BCP-47 short tag (en, fr, de, it) |
HTTP 422 unsupported sample_rate | not in {16000, 22050, 24000, 48000} | Pick a supported rate |
| Last word missing in CUDA TTS | client passed subtalker_dosample=true override | Remove override or set false; server default is false |
| WS closes 1011 mid-utterance | engine exception (Qwen3 / Kokoro init) | Check logs/tts.log; ./ddx-manage.sh restart --prod tts |
| SSE frames buffer at NGINX | proxy-buffering on | Add proxy_buffering off; and proxy_set_header X-Accel-Buffering no; |
audio_b64 payload too large | one-shot response holding 10s+ utterance | Switch to /v1/speak/sse for long text |
| Edits to dashboard.html don't show up | FastAPI caches the file at startup | ./ddx-manage.sh restart --prod tts then user does Cmd-Shift-R |
| Port-already-in-use after restart | Old process freed port during teardown | Just ./ddx-manage.sh start --prod tts again |
LLM pitfalls
| Symptom | Cause | Fix |
|---|---|---|
HTTP 404 model_not_found on chat | requested model not in registry | GET /v1/models first, pick an id |
HTTP 404 + detail="server not in real-model mode" on /v1/models/load | DDX_LLM_USE_REAL_MODEL=0 | Set env var, restart ./ddx-manage.sh restart --prod llm |
HTTP 422 context_too_long | total tokens exceed model's context_window | Truncate history or pick a larger-context model |
HTTP 503 model_unavailable | model file missing / load failed | Check model_error in /health; verify models/cache/ |
| Stream stalls at first chunk | upstream still loading the model | Poll /v1/models/current.loaded === true before streaming |
| CORS error in browser | origin not in DDX_LLM_CORS_ORIGINS | Add origin or proxy through Next.js (preferred) |
| Mock mode never returns text | mock always returns tool_calls | Switch to real mode (DDX_LLM_USE_REAL_MODEL=1) or handle tool_calls in client |
/v1/embeddings returns 404 | dudoxx layer doesn't expose embeddings | Use a separate embeddings service |
Wire-envelope pitfalls
| Symptom | Cause | Fix |
|---|---|---|
| Client breaks on a new field | strict schema parser rejects unknown keys | Envelope is additive — clients must ignore unknown keys |
| Mixing v1 and v2 paths in one session | breaking changes ship at /v2/* | Pick one path per session; v1 stays immutable |
| Hand-edited generated bindings drift | ddx-mlx-envelopes regenerated, custom edits lost | Never edit dist/; change schema + make gen |
| Visemes missing in TTS Audio frames | emit_visemes: false (default) or backend doesn't ship visemes | Pass emit_visemes: true; only CUDA backend emits PB-15 visemes |
| Word timestamps off by primer length | language primer prepended | Already fixed: server shifts word timestamps user-relative |
Operations / ddx-manage.sh pitfalls
| Symptom | Cause | Fix |
|---|---|---|
ddx-manage.sh start hangs the agent | foreground watch-mode | Always use --prod for service work; --dev is for human terminals |
restart reports "did not become healthy" but lsof -i :PORT empty | port freed during teardown | Re-run start --prod <svc> |
| Service silently using mock model | DDX_LLM_USE_REAL_MODEL not exported | DDX_LLM_USE_REAL_MODEL=1 ./ddx-manage.sh restart --prod llm |
| RSS climbing under load | Uvicorn workers too low | DDX_UVICORN_WORKERS=4 ./ddx-manage.sh restart --prod tts (~3.8GB/worker, RTF 2.0–2.7×) |
Logs noisy with Cannot unfreeze partially… under load | NeMo internal race; recovered by 3-attempt retry | Cosmetic; suppression via try_acquire(timeout=0) is a planned tweak |
Browser / Next.js pitfalls
| Symptom | Cause | Fix |
|---|---|---|
ws:// rejected from https:// page | mixed content | Open wss:// to your edge; bridge to ws:// upstream server-side |
| API key visible in browser dev tools | client called TTS / LLM directly | Always proxy via Next.js Route Handler; keep keys in process.env (no NEXT_PUBLIC_) |
| Server Action used for read-only data | misuse of Server Actions | Use Server Components + service functions for reads; Server Actions are for writes only |
| Hardcoded UI string in TSX | bypassed next-intl | Use t('key'); add to src/locales/{en,de,fr}/<ns>.json |
await params missed in page.tsx | Next.js 16 made params async | const { locale } = await params; |
| Custom WS server doesn't pass cookies upstream | next route handlers don't see WS upgrades | Use a tiny server.mjs wrapping next + ws (see STT skill) |
NestJS pitfalls
| Symptom | Cause | Fix |
|---|---|---|
WebSocketGateway doesn't accept connections | default adapter is socket.io | app.useWebSocketAdapter(new WsAdapter(app)) from @nestjs/platform-ws |
| Streaming controller never flushes | response held in memory | Set 'X-Accel-Buffering': 'no', Connection: keep-alive, write chunks directly |
| Guard order wrong (auth fires after roles) | Nest order: global → controller → handler | Stack as ApiTokenGuard → JwtAuthGuard → RolesGuard (cardinal NestJS rule) |
ConfigService.get('TTS_API_KEY') returns undefined | env not loaded or typo | Use getOrThrow('TTS_API_KEY') and load .env via ConfigModule.forRoot({ isGlobal: true }) |
Python pitfalls
| Symptom | Cause | Fix |
|---|---|---|
websockets.connect rejects custom headers | older websockets API | Pin websockets>=13, use additional_headers=[(...)] (not extra_headers) |
WS max_size too small for long streams | default 1 MiB | max_size=2**24 (16 MiB) for STT, max_size=None for TTS audio frames |
| Client receives partial JSON | reading bytes instead of str | Filter if not isinstance(msg, str): continue before json.loads |
| Async cancellation leaks tasks | no timeout on consumer | Wrap in asyncio.wait_for(task, timeout=N) and cancel on TimeoutError |
Reference
- STT:
ddx-cuda-live-stt/STT_API_USAGE.md,STT_FULL_CAPABILITIES.md - TTS:
ddx-cuda-live-tts/TTS_API_USAGE.md,TTS_FULL_CAPABILITIES.md - LLM:
ddx-mlx-llm/LLM_API_USAGE.md,LLM_API_ENDPOINTS.md - Envelope:
ddx-prd-specs/envelopes/README.md - Service control:
./ddx-manage.sh status|start|stop|restart|logs <svc>(svc ∈ stt, tts, usage, llm, web, all)