Feature Flags & Progressive Delivery
Feature flags let you ship code to production with a feature switched off, then turn it on gradually and turn it off instantly if something goes wrong. They are what make trunk-based development safe. But a flag is also a runtime branch and a piece of state, so it needs discipline: clear names, safe defaults, and removal once it has done its job.
A feature flag separates deploy (the code is in production) from release (users can see it). That lets you merge small, incomplete work to trunk safely, roll a feature out to 1% then 100%, and turn off a misbehaving feature without a redeploy. It is a core enabler of Trunk-Based Development and CI/CD.
The downside: every flag is a branch in behaviour that you must test in both states, and flags left lying around become confusing dead code and a risk. Treat them as temporary by default, and make the default the safe one.
Use flags well
- DoUse flags to merge incomplete work safely to trunk and to roll features out gradually (canary, then wider, then full) so problems show up on a small slice first.
- AlwaysDefault a flag to the safe, off, or current-behaviour state, so a missing or failed flag lookup fails closed, not into a half-built or risky path (see Designing for Failure).
- DoName flags clearly and record their purpose, owner, and intended removal date. A flag with no owner is a future mystery.
- DoKeep flag evaluation fast and resilient. If the flag service is unreachable, fall back to the safe default rather than erroring.
- ConsiderSeparating short-lived release flags (removed after rollout) from long-lived operational toggles and kill-switches (kept on purpose).
Don't let flags rot
- DoRemove a release flag (and the dead branch behind it) once the feature is fully rolled out. Clean-up is part of the work, not optional.
- DoTest both states of a flag, and remember the old behaviour still ships while the flag is off. Do not break it.
- AvoidDeeply nesting flags or letting them pile up. The combinations explode and nobody can reason about the real behaviour.
- AvoidUsing a feature flag to gate a security control. Security must be on by construction, not dependent on a toggle someone could flip off (see Secure Defaults).
- NeverUse a flag as a permanent substitute for finishing or properly removing work. That is hidden technical debt (see Technical Debt).
if (flags.Get("new-payments")) // throws if service down
useNewPayments();
// merged 8 months ago, fully on, flag + old path still here
A flag failure breaks payments instead of falling back, and long-dead code lingers behind a stale flag nobody dares delete. Both are avoidable.
// default false; missing/failed lookup => old, working path
if (flags.GetOrDefault("new-payments", false)) useNewPayments();
// JIRA-1234: remove flag + old path after 100% rollout (owner: payments)
Failing closed keeps payments working, and the flag has an owner and a removal plan so it will not become permanent debt.
Self-review checklist
- AskIf the flag lookup fails, does behaviour fall back to the safe or current path?
- AskDoes this flag have a clear name, an owner, and a removal plan?
- AskHave I tested both the on and off states?
- AskIs this gating anything security-critical that should be on by construction instead?