14.6 Anti-loop and Safety Recommendations

Orbnetes deployment and release orchestration documentation for operators and platform teams.

Automatic rollback introduces risk of recursive failure loops if not designed carefully.

Recommended safety controls:

  1. No rollback-of-rollback recursion
    Avoid enabling policy chains that trigger rollback repeatedly from rollback releases.
  2. Single authoritative check target
    Use one clear acceptance signal (pipeline or critical job), not multiple implicit signals.
  3. Known-good rollback source
    Ensure selected release/version is valid and deployable before incident occurs.
  4. Bounded retry strategy
    Do not treat rollback as infinite retry mechanism. Use explicit operator escalation after first rollback failure.
  5. Separation of critical vs optional job failures
    Combine allow_failure and check target design so non-critical failures do not trigger destructive rollback unnecessarily.
  6. Approval and notification awareness
    Ensure rollback-related events notify correct stakeholders immediately.
  7. Runbook alignment
    Document rollback policy per service and keep it aligned with real on-call procedures.

Operational checklist before enabling rollback in production

  • rollback mode chosen intentionally,
  • check target validated on non-prod scenarios,
  • delay value reviewed by service owners,
  • known-good rollback source exists,
  • team understands expected behavior during active incident.

This keeps rollback fast, safe, and operationally trustworthy.