- isDevServerListening: key off curl EXIT CODE not response time. The 2s
max-time treated a busy/compiling-but-listening dev server as DEAD, so ensure
restarted a healthy server on every refresh -> cold compile -> the
502/no-CSS/broken-images/perfect flicker. Now dead only when BOTH localhost
and 0.0.0.0 refuse the connection (curl exit 7).
- ensure route: liveness probe is fail-safe (try/catch) -> never 500s or
needlessly restarts on a probe error; trusts the DB flag instead.
- dev container: reconcile dead orphan containers before resume/start so a
leftover name no longer triggers 'container name already in use' -> Traefik
gateway timeout.
- dev container: inject AUTH_SECRET / NEXTAUTH_SECRET / AUTH_TRUST_HOST so
scaffolded NextAuth apps stop throwing [auth][error] MissingSecret in preview.
- chat prompt: don't bounce a healthy dev server; only claim actions a tool
actually performed (no hallucinated DB deletes); NextAuth previews pre-wired.
- intent budgets: route 'not appearing/showing/missing' to diagnose; bump
status_check 12->16, diagnose 15->22 so investigations don't hit the cap.
The dominant production failure was a dead dev-server process behind a
'running' DB flag (idle-stop / OOM / crash / host restart), which the UI
trusted and embedded -> permanent 502 until a manual restart.
- dev-container.ts: add isDevServerListening() fast liveness probe; stop the
container entrypoint from auto-running 'npx next dev --webpack' (it competed
with the managed server, forced the wrong bundler/cwd, and doubled memory);
drop the fake state='running' seed row; bump dev container memory 1g -> 2g.
- ensure route: verify a 'running' row is ACTUALLY listening and resurrect it
if dead, instead of trusting the flag; never bounce a healthy server.
- preview page: call ensure on every mount and on refresh (verify + heal),
force an immediate anatomy refetch on (re)start so a dead frame swaps to
'warming up' without the 5s lag.
Backstopped by the partial unique index + startDevServer idempotency, so heals
can never duplicate or thrash a server.
- Add partial unique index on (project_id, port) for active dev servers so the
SELECT-then-INSERT race can no longer create duplicate 'Port 3000' rows.
- Make startDevServer race-safe: on unique violation, adopt the winning row
instead of duplicating.
- ensure route no longer marks a server 'running' before it binds the port;
the readiness probe flips starting->running only after the port answers.
Kills the '502 -> broken CSS -> works' refresh loop.
- Deduplicate previews per-port in sortDevPreviewsFrontendFirst as a defensive
backstop for the dropdown.
- Revert iframe _refresh query-param hack (was forcing cold recompiles).