Design & Architecture

Distributed Systems & Consistency

Advanced

As soon as your system spans more than one process — services, a database and a cache, a queue, a third party — you are building a distributed system, and the easy assumptions of single-process code stop holding. Networks fail and lag, messages duplicate and reorder, and two places can disagree about the truth. Design for this reality instead of being surprised by it.

Distributed systems have well-known, unavoidable trade-offs. You cannot have perfect consistency, availability, and partition-tolerance all at once (CAP). Most real systems accept eventual consistency: different parts agree over time rather than being identical at every moment. The common mistakes are assuming a call always succeeds instantly, that everyone sees the same data at the same moment, or that you can update two systems atomically without a plan.

This pulls together topics from across the handbook — idempotency and transactions (Data Integrity), messaging (Asynchronous Messaging), failure handling (Designing for Failure), and caching — into the mindset you need when state lives in more than one place. For money and AML decisions, deciding where you need strong consistency versus where eventual is fine is a critical design call.

Assume the network and accept the trade-offs

Dual write, assume success db.Save(payment); // succeeds
bus.Publish(paymentEvent); // network blip -> never sent
// downstream never learns; systems now permanently disagree

Two separate writes with no consistency plan. A failure between them leaves the database and the rest of the system disagreeing forever. Across processes, you cannot assume both happen.

Outbox with idempotent consumers using var tx = db.Begin();
db.Save(payment); db.Save(outboxEvent); // one atomic transaction
tx.Commit();
// a relay publishes the outbox event; consumers are idempotent

The state change and the intent to publish commit together. Delivery happens reliably afterwards, and duplicates are safe. The systems converge correctly.

Design for partial truth

Self-review checklist

Why it matters: Most real outages and data-integrity bugs in modern systems come from distributed-systems realities that single-process thinking ignores: dual writes that diverge, retries that double-charge, stale reads treated as truth. Designing for the network, choosing consistency on purpose, and making operations idempotent is what keeps a multi-part, money-handling system correct under the failures that will happen.