Data & Integrity

Database Design & Schema

Intermediate

The schema outlives the application. Code gets rewritten and services get replaced, but the data and its shape stay for years and gather dependencies the whole time. Design the schema as the lasting contract it is: correct, constrained, and clear about what the data means. A bad schema is the most expensive mistake to undo.

A well-designed schema makes invalid states impossible to represent. It lets the database enforce rules the application would otherwise have to remember everywhere. Constraints, keys, and types are not red tape. They are the last line of defence for data integrity, and the one that holds even when the application code has a bug.

We run Dapper over SQL Server, with no ORM doing the thinking for us. This makes careful schema design more important, not less. There is no migrations-from-models safety net, so the schema is exactly what we write. For a multi-tenant AML platform, tenancy and integrity must be designed into the tables themselves.

Model for correctness

AlwaysLet the database enforce integrity. Primary keys, foreign keys, NOT NULL, UNIQUE, and CHECK constraints encode rules that no code path can then break.
DoChoose precise types: proper decimal types for money (never float), explicit lengths, UTC datetimes, and real enums or lookup tables instead of free text.
DoNormalise by default to remove duplication and update anomalies. Denormalise only on purpose, for a measured performance reason, with the trade-off written down.
DoCarry the tenant key on every tenant-owned table and index it, so tenant scoping is enforceable and fast at the data layer.
ConsiderSequential surrogate keys for stability, such as an identity or bigint, or a time-ordered UUIDv7 (Guid.CreateVersion7()), with separate unique constraints on natural business keys.
AvoidRandom GUID or UUIDv4 values (Guid.NewGuid()) as a clustered primary key. Random values scatter inserts across the index, which causes constant page splits, fragmentation, poor cache locality, and bloat. Prefer a time-ordered UUIDv7 via Guid.CreateVersion7(). It is globally unique and safe to generate client-side, but sequential, so inserts append cleanly.
Do notStore money as floating point, dates as strings, or several values packed into one column. Each one is a future correctness bug.
NeverStore special-category or cardholder data we do not need. If we must hold it, encrypt it at rest and tighten access at the column level.

Integrity left to the app

CREATE TABLE Payments (
  Id uniqueidentifier,
  Amount float,
  Currency varchar(50),
  CustomerId uniqueidentifier
);

Float loses pennies. There are no keys or constraints, no tenant column, and the currency is free text. Every rule about this data now lives only in code, and only until someone forgets it.

The database enforces the rules

CREATE TABLE Payments (
  Id uniqueidentifier PRIMARY KEY,
  TenantId uniqueidentifier NOT NULL,
  CustomerId uniqueidentifier NOT NULL REFERENCES Customers(Id),
  Amount decimal(19,4) NOT NULL CHECK (Amount >= 0),
  CurrencyCode char(3) NOT NULL,
  CreatedUtc datetime2 NOT NULL
);
CREATE INDEX IX_Payments_Tenant_Customer ON Payments(TenantId, CustomerId);

Money is exact, the engine guarantees relationships and ranges, and tenant scoping is present and indexed.

Random GUID as the clustered key

Id uniqueidentifier DEFAULT NEWID() PRIMARY KEY   -- random
// app side: new Customer { Id = Guid.NewGuid() }

Each new row's key is random, so SQL Server inserts it into the middle of the clustered index. This causes endless page splits, heavy fragmentation, and a cache full of half-empty pages. On a busy table this quietly slows insert throughput.

Time-ordered UUIDv7

Id uniqueidentifier PRIMARY KEY   -- value supplied by the app
// app side: new Customer { Id = Guid.CreateVersion7() }

UUIDv7 includes a timestamp, so values are roughly sequential. Inserts append to the end of the index like an identity column, while staying globally unique and safe to generate client-side (no round trip, no central sequence).

Design for change and scale

DoIndex to match real query patterns. Cover the predicates you actually filter and join on, including the tenant key, and verify with execution plans.
DoName things consistently and clearly. The schema is read far more often than it is changed, and good names prevent whole classes of mistake.
DoAdd soft-delete and audit columns (created, updated, by whom) where regulated or auditable data lives, instead of adding them later.
ConsiderHow the table will grow, including partitioning, archival, and retention, before it is large enough to make those changes painful.
Do notOver-index. Every index is a write cost and a maintenance burden, so add them for evidence, not for comfort.
NeverMake a schema change directly in production without change control, review, and a migration (see Schema Versioning).

Self-review checklist

AskCan the database represent an invalid state here, or do constraints make it impossible?
AskIs money exact, are dates UTC, and are categorical values constrained rather than free text?
AskIs the tenant key present and indexed on every tenant-owned table?
AskDo my indexes match the queries I actually run, and have I checked the plan?

Why it matters: Data is the most lasting and most valuable thing we own, and the schema is the only place integrity can be guaranteed rather than hoped for. A schema that enforces correctness prevents bugs the application cannot, while a careless one quietly corrupts data for years before anyone notices. In regulated finance, corrupted data is a reporting failure waiting to happen.