3.1 Control Plane vs Agent Execution Plane
3.2 Project Scope Model
3.3 Data Flow: Source -> Release -> Pipeline -> Logs/Artifacts
3.4 Runtime Configuration Layers (global / project / environment)
3.5 Pipeline Execution Semantics
3.6 Release Governance Path
3.7 Rollback Architecture (Policy-driven)
3.8 Security and Trust Boundaries
3.9 State and Persistence Model
3.10 Scalability Model
3.11 Failure Modes and Recovery Patterns
3.12 Why This Architecture Works in Practice
20.5 Incident Triage with Graph + Live Logs
Orbnetes deployment and release orchestration documentation for operators and platform teams.
Objective
Diagnose and mitigate a failed or stuck execution quickly using built-in runtime visibility.
Triage Workflow
- Open release or pipeline page.
- Inspect DAG graph:
- find first failed node,
- identify blocked dependents.
- Open corresponding live job page.
- Use step timeline to locate first failing step.
- Search logs for error signature (
permission denied,not found,timeout, etc.). - Classify failure type:
- routing/tag,
- config/secrets,
- runtime/tooling,
- external dependency/network.
- Decide recovery action:
- rerun failed,
- rerun all,
- cancel,
- rollback.
- Capture evidence (log download + IDs) for incident record.
Success Criteria
- root cause category identified quickly,
- recovery action executed with minimal guesswork,
- incident evidence preserved (links/logs/status timeline).
Common Pitfalls
- focusing on final error line instead of first causal failure,
- rerunning repeatedly without correcting underlying config/routing issue,
- not checking approval/dependency gates before assuming runner failure.
Operational Note for Playbook Usage
Treat these playbooks as baseline templates. For production readiness, add service-specific guardrails:
- health-check gates,
- rollback eligibility rules,
- communication/escalation steps,
- post-deploy validation checklist.