Before this fix, devcontainer.status was a read-only DB query that returned whatever state the row currently held. The state only flips provisioning→running via touchActivity() inside execInDevContainer. That created a deadlock: the AI polls devcontainer.status waiting for 'running'; status will never flip until something else execs. Caught live in smoke test 2026-05-01 (manifest project) — the AI fired devcontainer.status three times in a row, hit the loop guard, and surfaced the dead-end to the user. Two fixes: 1. getDevContainerStatus() now does a cheap 'true' exec probe when the row says 'provisioning'. If the probe lands, it flips the row to 'running' via touchActivity and reports selfHealed=true. If the probe fails AND the row is older than 120s, it reports likelyFailed=true so callers can stop polling and escalate. Also returns ageSeconds for the AI to reason about wait windows. Coolify's own service status is not used because dev containers have no fqdn/healthcheck and Coolify reports running:unknown for any such service forever. 2. New error-recovery rule 'devcontainer-still-provisioning' that fires whenever a status response contains state:'provisioning'. Tells the AI to send one status message, wait 15s, and prefer shell.exec (which lazy-provisions and proves reachability) over another devcontainer.status call. Explicit antipattern: do not poll status in a tight loop. Co-authored-by: Cursor <cursoragent@cursor.com>
32 KiB
32 KiB