gRPC & Protobuf Conventions
gRPC services are defined by their .proto files. That schema is the contract between every client and server, and once other teams depend on it you can only change it in safe, additive ways. Design the proto with care, use the standard status codes and deadlines, and secure the transport. This is the concrete conventions reference that goes with the broader API & Contract Design guideline.
gRPC gives you a typed, fast, binary contract and generated client code in many languages. The cost is that the wire format depends on field numbers, not names, so a small change to a .proto file can quietly break every existing client. Treat the proto as a long-lived public contract from day one.
All the security rules still apply: authenticate and authorise every call, validate input on the server, never return too much data, and never leak internals (see Authentication & Authorization, Trust Boundaries). gRPC has no built-in auth, so identity travels in metadata and is checked by an interceptor.
Proto & schema design
- DoUse
proto3. Name services and RPCs inPascalCase(CustomerService,GetCustomer) and message fields insnake_case. Put the service in a versioned package (finperiti.customers.v1). - DoGive each RPC its own request and response message, even if one field would do today. This lets you add fields later without changing the method signature.
- DoUse the well-known types for common values:
google.protobuf.Timestampfor time,Durationfor spans, and explicit enums for fixed sets. Make the first enum value a zero*_UNSPECIFIEDdefault. - AvoidMapping proto messages one-for-one to database rows, or putting unrelated calls on one giant service. Model the domain and the use cases, not the storage.
Status codes & errors
- DoReturn the correct canonical status code:
OKfor success,INVALID_ARGUMENTfor validation,UNAUTHENTICATED,PERMISSION_DENIED,NOT_FOUND,ALREADY_EXISTS,FAILED_PRECONDITION,RESOURCE_EXHAUSTEDfor rate limits, andINTERNAL/UNAVAILABLEfor server faults. - NeverReturn
OKfor a failed call. The status code is the result. AnOKwith an error packed into the response body hides the failure from clients, retries, and monitoring. Use the right error code. - DoAttach structured detail to errors with
google.rpc.Statusdetails (for example field-level validation errors), using a stable, documented shape across services. - NeverLeak internal details (stack traces, SQL, secrets, file paths) in a status message or detail, or return another tenant's or user's data (see Multi-Tenancy, Error Handling).
Deadlines, streaming & limits
- AlwaysSet a deadline on every call and honour it on the server. A call with no deadline can hang forever and tie up resources. Propagate the incoming deadline to any downstream calls.
- DoCap the maximum message size, and use streaming for large or long-running data instead of one huge message. Handle backpressure on streams so a slow consumer cannot exhaust memory.
- DoMake retries safe. Only retry idempotent RPCs, use a retry policy with backoff, and use idempotency keys where a repeat could apply a change twice (see Data Integrity & Transactions, Rate Limiting & Abuse Prevention).
Safety & evolution
- AlwaysUse TLS for every connection, and mTLS between internal services. Authenticate from metadata and authorise every RPC in an interceptor, deriving identity and tenant on the server (see Authentication & Authorization, Secrets at Rest & in Transit).
- NeverReuse, renumber, or change the type of an existing field number. Field numbers are the wire contract. To drop a field,
reserveits number and name so they are never used again. - DoEvolve by adding fields and new RPCs, never by removing or repurposing them. Make breaking changes a new package version (
v2) with a deprecation path (see Backward Compatibility). - DoUse interceptors for cross-cutting concerns (auth, logging, tracing, metrics) so every RPC is covered the same way, and validate every request on the server.
A service, end to end
message Customer { string name = 1; string ssn = 1; } // reused number 1
// server: return OK with { error: "not found" } in the body
// client: call with no deadline, no TLS
A reused field number corrupts the wire format for old clients, an OK hides a real failure, and a call with no deadline over plaintext can hang and leak data. Broken and unsafe.
message Customer { string name = 1; reserved 2; reserved "ssn"; }
// server: return NOT_FOUND with google.rpc.Status details, no internals
// client: deadline set, TLS on, retry only if idempotent
Field numbers are stable and removed ones are reserved, the correct status code is returned with a safe detail shape, and every call is bounded by a deadline over TLS.
Self-review checklist
- AskDid I add fields and RPCs without reusing, renumbering, or retyping any existing field number?
- AskDoes every RPC return the correct status code on failure, never OK, with no internal detail leaked?
- AskIs every call bounded by a deadline and a message-size limit, over TLS, and authorised on the server?
- AskAre retries restricted to idempotent calls, with backoff and idempotency keys where a repeat could apply a change twice?
.proto file, and field numbers, not names, define the wire format. That makes gRPC fast and strongly typed, but it also means a careless change can silently break every client in production. Stable field numbers, correct status codes, deadlines on every call, secure transport, and server-side authorisation are what keep gRPC services fast, safe, and able to evolve without breaking the teams that depend on them.