Design & Architecture

Third-Party Integrations & Resilience

Foundational

Every external dependency is a system you rely on but do not control. It will be slow, it will go down, it will change its responses, and one day it will be wrong. Integrate so their bad day does not become yours. Put the provider behind a layer you own, verify that responses are genuine, set sensible timeouts, and give a safe answer when the provider cannot.

We use third parties for things we should not build ourselves: identity (Veriff, Sumsub), payments (Stripe), storage, and more. This is the right choice. But each integration brings the provider's availability, latency, security, and correctness into our system. Resilient integration means you treat failures as normal and design for them on purpose. Do not just code the success path and hope.

Two concerns matter most for us. First, authenticity: verify inbound callbacks and webhooks before you trust them. The Finperiti unsigned-webhook finding shows what happens when you do not. Second, the fail direction: when a screening or KYC provider times out or errors, block-and-escalate, never auto-approve (see Designing for Failure). Resilience and safety are the same discipline here.

Isolate and verify

DoPut each provider behind an interface or adapter you own. This stops vendor types and quirks from leaking into business logic, and lets you swap or mock the provider.
DoVerify that every inbound webhook or callback is genuine (signature, shared secret, mutual TLS) before you act on it. Also guard against replay attacks (timestamp or nonce).
DoTreat provider responses as untrusted input. Validate and constrain them. Do not assume the shape, range, or values you expect (see Trust Boundaries).
DoKeep provider credentials in the vault and read them through the secure workflow. Never put them in the integration code.
NeverAct on an inbound webhook or third-party callback before you verify it is genuine.

Trust the callback, no timeout

[HttpPost("/stripe/webhook")] public IActionResult OnPaid(Event e) {
  order.MarkPaid(); return Ok();   // unverified, and forged events accepted
}

Anyone can POST a fake 'paid' event and get goods for free. And calls without timeouts let a slow provider hang every request thread. Both authenticity and resilience are missing.

Verify, then act; bounded calls elsewhere

var evt = stripe.ConstructEvent(rawBody, sigHeader, secret); // throws if forged
if (evt.Type == "payment_intent.succeeded") order.MarkPaid(evt.Id); // idempotent
// outbound: httpClient with timeout + Polly retry/circuit-breaker

The webhook is verified with cryptography before any action. Marking it paid is idempotent, so duplicates are safe. Outbound calls have timeouts and are isolated.

Survive their failures

AlwaysSet an explicit timeout on every outbound call. Use retries with backoff only for transient, idempotent operations.
DoApply circuit breakers and bulkheads. This way, a struggling provider degrades one feature instead of using up all threads and taking down the system.
DoDecide the safe behaviour for each failure mode on purpose. For screening, KYC, and sanctions, that means block-and-escalate, never auto-approve.
ConsiderUse idempotency keys and an outbox so a provider's duplicate delivery or your retry does not double-charge or double-apply.
ConsiderReconcile against the provider (for example, payment status) to catch missed or out-of-order callbacks. Do not trust a single notification.
NeverAuto-approve a customer or transaction when a provider check is missing, failed, timed out, or returned a result you do not recognise.

Self-review checklist

AskIs this provider behind an adapter I own, or are its types leaking into my logic?
AskFor inbound callbacks: do I verify they are genuine and prevent replay before I act?
AskWhat happens when this provider is slow, down, or returns bad data — do I time out and give a safe answer?
AskIf a check cannot finish, do I block-and-escalate, or proceed by accident?

Why it matters: Third parties are where availability and trust leave our control, so this is where outages and security holes most often start. A forged webhook can let a fraudster pass KYC, and a call with no timeout to a slow provider can take down the whole service. When we isolate the provider, verify its responses, and fail safe, their bad days become a contained, correct response on our side.