Operations

Feature Flags & Progressive Delivery

Intermediate

Feature flags let you ship code to production with a feature switched off, then turn it on gradually and turn it off instantly if something goes wrong. They are what make trunk-based development safe. But a flag is also a runtime branch and a piece of state, so it needs discipline: clear names, safe defaults, and removal once it has done its job.

A feature flag separates deploy (the code is in production) from release (users can see it). That lets you merge small, incomplete work to trunk safely, roll a feature out to 1% then 100%, and turn off a misbehaving feature without a redeploy. It is a core enabler of Trunk-Based Development and CI/CD.

The downside: every flag is a branch in behaviour that you must test in both states, and flags left lying around become confusing dead code and a risk. Treat them as temporary by default, and make the default the safe one.

Use flags well

DoUse flags to merge incomplete work safely to trunk and to roll features out gradually (canary, then wider, then full) so problems show up on a small slice first.
AlwaysDefault a flag to the safe, off, or current-behaviour state, so a missing or failed flag lookup fails closed, not into a half-built or risky path (see Designing for Failure).
DoName flags clearly and record their purpose, owner, and intended removal date. A flag with no owner is a future mystery.
DoKeep flag evaluation fast and resilient. If the flag service is unreachable, fall back to the safe default rather than erroring.
ConsiderSeparating short-lived release flags (removed after rollout) from long-lived operational toggles and kill-switches (kept on purpose).

Don't let flags rot

DoRemove a release flag (and the dead branch behind it) once the feature is fully rolled out. Clean-up is part of the work, not optional.
DoTest both states of a flag, and remember the old behaviour still ships while the flag is off. Do not break it.
AvoidDeeply nesting flags or letting them pile up. The combinations explode and nobody can reason about the real behaviour.
AvoidUsing a feature flag to gate a security control. Security must be on by construction, not dependent on a toggle someone could flip off (see Secure Defaults).
NeverUse a flag as a permanent substitute for finishing or properly removing work. That is hidden technical debt (see Technical Debt).

Unsafe default, never removed

if (flags.Get("new-payments"))   // throws if service down
  useNewPayments();
// merged 8 months ago, fully on, flag + old path still here

A flag failure breaks payments instead of falling back, and long-dead code lingers behind a stale flag nobody dares delete. Both are avoidable.

Safe default, owned, scheduled

// default false; missing/failed lookup => old, working path
if (flags.GetOrDefault("new-payments", false)) useNewPayments();
// JIRA-1234: remove flag + old path after 100% rollout (owner: payments)

Failing closed keeps payments working, and the flag has an owner and a removal plan so it will not become permanent debt.

Self-review checklist

AskIf the flag lookup fails, does behaviour fall back to the safe or current path?
AskDoes this flag have a clear name, an owner, and a removal plan?
AskHave I tested both the on and off states?
AskIs this gating anything security-critical that should be on by construction instead?

Why it matters: Feature flags are what let us integrate continuously, release gradually, and recover instantly. Left unmanaged, they become a tangle of stale toggles and surprising behaviour. Safe defaults, clear ownership, and disciplined removal keep them an asset for fast, safe delivery rather than a new source of risk.