3.1 Control Plane vs Agent Execution Plane
3.2 Project Scope Model
3.3 Data Flow: Source -> Release -> Pipeline -> Logs/Artifacts
3.4 Runtime Configuration Layers (global / project / environment)
3.5 Pipeline Execution Semantics
3.6 Release Governance Path
3.7 Rollback Architecture (Policy-driven)
3.8 Security and Trust Boundaries
3.9 State and Persistence Model
3.10 Scalability Model
3.11 Failure Modes and Recovery Patterns
3.12 Why This Architecture Works in Practice
15.5 Common Failure Patterns
Orbnetes deployment and release orchestration documentation for operators and platform teams.
Below are frequent failure classes and how to recognize them quickly.
1) Tag routing mismatch
Symptoms:
- jobs stay queued,
- no agent picks job despite active pipeline.
Check:
- job tags vs agent tags,
- project allowed agents mapping,
- agent online state.
2) Missing secrets/variables
Symptoms:
- step fails immediately with missing env/config errors.
Check:
- key exists in expected scope (environment/project/global),
- environment selection at launch is correct,
- key naming exact match.
3) Missing release file binding
Symptoms:
- deploy step references release file variable but path/value is empty.
Check:
- source/tag/file selected in release,
- blueprint actually expects
ORBN_RELEASE_FILE, - selected artifact available from source.
4) Dependency blocking (needs)
Symptoms:
- downstream job waiting indefinitely or skipped/blocked after upstream fail.
Check:
- upstream status,
- dependency chain in graph,
- failure policy/conditions.
5) Shell/runtime mismatch
Symptoms:
- command syntax errors, path issues, executable not found.
Check:
- shell type (
bashvspowershell), - agent OS/architecture expectations,
- tool presence on runner host.
6) External service/network failures
Symptoms:
- timeout, connection refused, DNS/auth errors.
Check:
- target endpoint availability,
- network egress/firewall,
- credentials/token validity,
- transient vs persistent pattern across reruns.
7) Resource pressure on agent
Symptoms:
- step slowdown, random command failures, disk write errors.
Check:
- runtime metrics (CPU/memory/disk),
- work directory space,
- concurrent job load on same host.
Observability Response Checklist
When a run fails:
- Open pipeline graph and locate first failed/blocked branch.
- Open job live page and inspect first failed step output.
- Correlate status + duration + timestamp for context.
- Classify failure pattern (routing, config, runtime, dependency, external).
- Apply rerun strategy (
failedorall) with corrected inputs/config. - Preserve logs for incident record if production impact exists.
This workflow gives fast diagnosis while keeping evidence and actions traceable.