Engineering Excellence

Measuring What Matters

Intermediate

Metrics are powerful, and they can also do harm. The right ones focus a team and drive real improvement. The wrong ones, or the right ones used badly, change behaviour for the worse, reward small things, and break trust. This page is about choosing metrics that reflect real value, knowing their limits, and never letting a number replace judgement.

Goodhart's Law warns: "when a measure becomes a target, it stops being a good measure." People work to improve whatever is counted. So a badly chosen metric slowly changes behaviour around it, often in ways that hurt the real goal. A related idea is the difference between actionable metrics, which tell you what to do, and vanity metrics, which look good but change nothing.

This page adds judgement on top of DORA, SPACE, and benchmarking. It explains how to pick, read, and use metrics well, and how to avoid the common traps that make measurement pointless or harmful.

Choose metrics that reflect value

DoPrefer outcome metrics (did it deliver value, reliably?) over output metrics (how much was produced).
DoMake metrics actionable. Each one should point to a decision or an improvement. If a number would not change what you do, do not track it as a target.
DoUse balanced sets so that improving one does not quietly harm another: speed and stability, output and quality, individual and team (see DORA Metrics).
DoPair numbers with people. Numbers tell you what happened. People tell you why (see SPACE Framework).
DoWatch trends over time against a baseline. Do not rely on single snapshots or league tables that rank teams (see Engineering Benchmarking).
ConsiderBefore you adopt a metric, ask: if someone tried to make this number look great, what would they do, and would that be good or bad?

Avoid the traps

AlwaysRemember Goodhart's Law: once a measure becomes a target, expect people to game it. Design metrics with this in mind. Use them as signals for discussion, not as quotas.
AvoidVanity metrics such as lines of code, commit or PR counts, story-point "velocity" as a productivity score, and hours worked. They reward activity, not value.
AvoidSingle-number dashboards that claim to measure "productivity" or "quality". The real thing has many dimensions (see SPACE Framework).
AvoidLetting a metric replace judgement. A green dashboard does not prove things are fine, and a red one does not prove someone failed.
NeverUse metrics against individuals (ranking people, or targets tied to commit counts). This causes gaming and destroys the trust the data depends on (see Respect & Inclusion).

Velocity as a target

// team told: increase story-point velocity 20% each quarter
// result: points inflated, corners cut, quality drops, trust erodes

Velocity became a target, so it stopped measuring anything real. Estimates grow and quality drops. This is Goodhart's Law: the number rises while value falls.

Outcome, balance, and the team's view

// track DORA (speed + stability) + a satisfaction signal,
// review trends in retros, ask the team what is slowing them down,
// run an experiment, re-measure

Balanced outcome metrics show the real bottlenecks. The team's input explains them. Improvement comes from experiments, not from hitting a number people can game.

Self-review checklist

AskDoes this metric reflect real value and outcomes, or just activity?
AskIf someone worked only to improve this number, would the result be good?
AskIs it balanced, and paired with the team's own view?
AskAm I letting the number replace judgement, or inform it?

Why it matters: Metrics shape behaviour whether you mean them to or not. So the wrong ones make a team worse: gaming, cut corners, and lost trust, all while looking like progress. Choose metrics that are actionable, balanced, and tied to real value. Hold them as signals, not targets. That is what makes measurement drive real improvement instead of empty reporting.