Monitoring & Health
Three health signals
| Signal | Source |
|---|---|
| Agent health | Agent.isActive + Agent.lastHeartbeat (updated from the agent's heartbeat POST) |
| Session health | Session.status enum + per-provider healthCheck() |
| Tunnel health | Tunnel provider's subprocess state |
See Health concept.
Where to watch
| Surface | What |
|---|---|
| Web app per-agent badge | Live agent status |
| Web app per-vibe / per-session view | Session + tunnel status |
vibecontrols agents list --active | Quick CLI check |
Local agent HTTP probe: GET /health | Process liveness |
GraphQL subscriptions agentHealth, sessionOutput | Live updates |
Metrics
Per-tenant metrics are exposed via the backend GraphQL tenantMetrics field (managed) or via your own observability stack (self-hosted). Key metrics:
- Active agents (current / 24h average)
- Active sessions (by type)
- Active tunnels (by provider)
- AI tool events (by source)
- Sandbox concurrency
- Backend API p50/p95/p99 latency
Alerts
Configure via the platform's notification system using these event types:
agent.disconnectedagent.heartbeat_stalesession.errortunnel.failedbackup.failed
Each alert maps to your configured notification channels (Slack, email, PagerDuty, custom).
Incident response
For managed cloud incidents: check status.vibecontrols.com first. Tenant admins receive email notifications for incidents that affect them.
For self-hosted: monitor your own infrastructure dashboards.