3.1 Control Plane vs Agent Execution Plane
3.2 Project Scope Model
3.3 Data Flow: Source -> Release -> Pipeline -> Logs/Artifacts
3.4 Runtime Configuration Layers (global / project / environment)
3.5 Pipeline Execution Semantics
3.6 Release Governance Path
3.7 Rollback Architecture (Policy-driven)
3.8 Security and Trust Boundaries
3.9 State and Persistence Model
3.10 Scalability Model
3.11 Failure Modes and Recovery Patterns
3.12 Why This Architecture Works in Practice
6.6 Agent Status, Metrics, and Troubleshooting
Orbnetes deployment and release orchestration documentation for operators and platform teams.
Agent status and runtime metrics are your first diagnostic layer.
Typical status signals:
- online/offline/inactive,
- last heartbeat time,
- reported runner version,
- OS/platform/hostname,
- runtime metrics (CPU, memory, disk where available).
Quick troubleshooting workflow
1. Agent not claiming jobs
- Verify agent is online.
- Verify project allows this agent.
- Verify blueprint job tags match agent tags.
- Check queue for blocked dependencies/approval waits.
2. Agent appears online but jobs fail immediately
- Inspect job-run live log first failing step.
- Check shell availability and permissions on host.
- Verify runtime config (secrets/vars) is present.
3. Version mismatch in UI
- Confirm running binary version on host.
- Confirm heartbeat payload includes updated agent version.
- Check service restart after update.
- Verify update package target points to intended build.
4. Update fails or loops
- Inspect service logs for restart behavior.
- Validate package format and executable naming.
- Ensure API credentials and download endpoint are accessible.
- Roll back to known-good runner package if needed.
5. Disk or memory pressure
- Review runtime metrics from agent status.
- Clean runner work directories/artifact leftovers.
- Increase host capacity or split workload across more agents.
Operational best practices
- Keep at least one spare agent for critical tags.
- Monitor heartbeat freshness and queue depth together.
- Standardize runner versions per environment tier.
- Regularly test fresh install path (not only upgrade path).
- Treat agent fleet as managed infrastructure, not ad-hoc hosts.