Engineering Excellence

Engineering Benchmarking

Intermediate

You cannot improve what you do not measure. But you can easily make things worse by measuring the wrong things. Benchmarking means comparing how we deliver software against useful baselines: our own past, industry research, and best-practice frameworks. Done well, it shows us where to improve. Done badly, it pushes people toward the wrong behaviour. We benchmark to learn, never to rank people.

Great engineering teams are made, not born. The teams that improve fastest are the ones that measure honestly and act on what they find. This group of guidelines is your toolkit for that: the DORA metrics (delivery performance), the SPACE framework (broad productivity), developer experience, continuous improvement, and how to benchmark against outside best practice. This page sets the ground rules so the metrics help rather than harm.

Two principles run through all of it. First, measure outcomes and systems, not individuals. These are tools to improve how the team works, not to rank or punish people. Ranking people only creates gaming and fear. Second, no single number tells the truth. Use balanced sets of metrics and pair them with input from the team.

Benchmark to improve

DoBe clear what question a metric answers before you adopt it. Start from "what do we want to improve?", not "what can we count?".
DoSet a baseline (where are we now?) before you try to improve, so you can tell whether a change actually helped.
DoCompare against useful references: our own trend over time, industry research (such as DORA's State of DevOps), and best-practice frameworks. Do not use arbitrary targets.
DoUse balanced sets of metrics so improving one does not quietly damage another (for example, balance speed against stability; see DORA Metrics).
DoPair the numbers with the team's input. The team explains what the metrics cannot (see SPACE Framework).
DoTreat metrics as a starting point for a discussion and an experiment. Then measure again to see if the change worked (see Continuous Improvement).
ConsiderReviewing benchmarks on a regular schedule rather than constantly. Trends matter more than any single reading.

Don't let measurement do harm

AlwaysMeasure teams and systems, not individuals. These metrics are for improving how we work, never for ranking, rating, or punishing people.
DoRemember Goodhart's Law: when a measure becomes a target, it stops being a good measure. Watch for gaming (see Measuring What Matters).
DoBe open about what is measured and why. Metrics done to people create fear and gaming. Metrics done with them build improvement.
AvoidVanity metrics that look impressive but drive nothing, such as lines of code, raw commit counts, and hours worked. They reward the wrong things.
AvoidComparing teams that work in very different settings as if the number alone tells the whole story.
NeverUse engineering metrics to monitor, rank, or pressure individuals. It destroys trust and corrupts the data (see Respect & Inclusion, Wellbeing & Sustainable Pace).

Self-review checklist

AskWhat decision or improvement will this metric actually drive?
AskDo I have a baseline to compare against?
AskCould optimising this number make something else worse, or be gamed?
AskIs this measuring the team/system, or sliding into measuring individuals?

Why it matters: Teams that measure their delivery honestly and act on it do better than teams that do not. This is the main finding of years of industry research. But the same measurements, pointed at individuals or treated as targets, create fear and gaming and make everything worse. Benchmarking the right way holds us to a high standard and helps us keep getting better. Doing it wrong damages good teams.