Skip to main content

Monitoring & Health

Three health signals

SignalSource
Agent healthAgent.isActive + Agent.lastHeartbeat (updated from the agent's heartbeat POST)
Session healthSession.status enum + per-provider healthCheck()
Tunnel healthTunnel provider's subprocess state

See Health concept.

Where to watch

SurfaceWhat
Web app per-agent badgeLive agent status
Web app per-vibe / per-session viewSession + tunnel status
vibecontrols agents list --activeQuick CLI check
Local agent HTTP probe: GET /healthProcess liveness
GraphQL subscriptions agentHealth, sessionOutputLive updates

Metrics

Per-tenant metrics are exposed via the backend GraphQL tenantMetrics field (managed) or via your own observability stack (self-hosted). Key metrics:

  • Active agents (current / 24h average)
  • Active sessions (by type)
  • Active tunnels (by provider)
  • AI tool events (by source)
  • Sandbox concurrency
  • Backend API p50/p95/p99 latency

Alerts

Configure via the platform's notification system using these event types:

  • agent.disconnected
  • agent.heartbeat_stale
  • session.error
  • tunnel.failed
  • backup.failed

Each alert maps to your configured notification channels (Slack, email, PagerDuty, custom).

Incident response

For managed cloud incidents: check status.vibecontrols.com first. Tenant admins receive email notifications for incidents that affect them.

For self-hosted: monitor your own infrastructure dashboards.

Next steps