gRPC and internal service APIs
In this series (15 parts)
- Backend system design scope
- Designing RESTful APIs
- Authentication and session management
- Database design for backend systems
- Caching in backend systems
- Background jobs and task queues
- File upload and storage
- Search integration
- Email and notification delivery
- Webhooks: design and security
- Payments integration
- Multi-tenancy patterns
- Backend for Frontend (BFF) pattern
- GraphQL server design
- gRPC and internal service APIs
Inside your backend, services talk to each other constantly. An order service calls the inventory service. The notification service calls the user service. The analytics pipeline streams events from every service. REST works for these calls, but it was designed for the client-server boundary, not for internal communication where both sides are services you control.
gRPC was built for this. It uses HTTP/2 for multiplexing, protocol buffers for compact binary serialization, and code generation for type-safe clients in every major language. The result is internal APIs that are faster, smaller on the wire, and harder to misuse than JSON-over-HTTP.
For broader API design context, see API design. For patterns on how services discover and communicate with each other, see microservice communication.
Why gRPC for internal services
The case for gRPC internally comes down to three properties:
Binary serialization. Protocol buffers encode data in a compact binary format. A message that is 1 KB as JSON might be 300 bytes as protobuf. For internal services making millions of calls per minute, the bandwidth and serialization savings are significant.
Strong typing with code generation. You define your API in a .proto file. The protobuf compiler generates client and server stubs in Go, Java, Python, TypeScript, or any supported language. If the schema changes in a way that breaks compatibility, the code does not compile. Compare this to REST, where a renamed JSON field silently breaks a consumer at runtime.
HTTP/2 multiplexing. gRPC runs over HTTP/2, which multiplexes multiple requests over a single TCP connection. No head-of-line blocking at the HTTP level. Connection reuse reduces the overhead of TLS handshakes and TCP slow-start that accumulate with REST’s typical connection-per-request pattern.
gRPC with protobuf is substantially cheaper across all dimensions. The connection setup savings compound over millions of requests.
Protocol buffers schema design
A .proto file defines your service contract. Here is a typical service definition:
syntax = "proto3";
package inventory;
option go_package = "github.com/myorg/inventory/pb";
service InventoryService {
rpc CheckStock(CheckStockRequest) returns (CheckStockResponse);
rpc ReserveItems(ReserveItemsRequest) returns (ReserveItemsResponse);
rpc ReleaseReservation(ReleaseRequest) returns (ReleaseResponse);
rpc WatchStockLevels(WatchRequest) returns (stream StockUpdate);
}
message CheckStockRequest {
string product_id = 1;
string warehouse_id = 2;
}
message CheckStockResponse {
string product_id = 1;
int32 available_quantity = 2;
bool is_reserved = 3;
}
message ReserveItemsRequest {
string order_id = 1;
repeated LineItem items = 2;
string idempotency_key = 3;
}
message LineItem {
string product_id = 1;
int32 quantity = 2;
}
Schema design rules:
- Use field numbers, not names, for wire format. Protobuf serializes by field number. You can rename fields freely without breaking compatibility. Never reuse a field number after removing a field; mark it as
reserved. - Add fields, never remove. Adding new fields with new numbers is always backward-compatible. Old clients ignore unknown fields. Old servers use default values for missing fields.
- Use wrapper types for nullable fields. In proto3, scalar fields cannot distinguish between “not set” and “default value.” Use
google.protobuf.StringValuewhen null vs empty string matters. - Small, focused services. Each
.protofile should define one service with a cohesive set of RPCs. Do not create a god-service with 50 methods. - Include an idempotency key in mutating requests. This is the same principle from payment APIs applied to internal services.
The four RPC types
gRPC supports four communication patterns:
Unary RPC
One request, one response. This is the equivalent of a REST endpoint. The client sends a request and waits for a response.
rpc CheckStock(CheckStockRequest) returns (CheckStockResponse);
Use for: lookups, CRUD operations, anything that fits request/response.
Server streaming
The client sends one request. The server sends back a stream of responses.
rpc WatchStockLevels(WatchRequest) returns (stream StockUpdate);
Use for: real-time feeds, log tailing, long-running query results that arrive incrementally.
Client streaming
The client sends a stream of messages. The server reads them and returns a single response.
rpc UploadTelemetry(stream TelemetryEvent) returns (UploadSummary);
Use for: bulk uploads, telemetry ingestion, file uploads in chunks.
Bidirectional streaming
Both sides send streams of messages independently. Neither side has to wait for the other.
rpc Chat(stream ChatMessage) returns (stream ChatMessage);
sequenceDiagram
participant Client
participant Server
Note over Client,Server: Bidirectional Streaming RPC
Client->>Server: open stream
Server->>Client: stream open
par Client sends
Client->>Server: message 1
Client->>Server: message 2
Client->>Server: message 3
and Server sends
Server->>Client: response A
Server->>Client: response B
end
Client->>Server: message 4
Server->>Client: response C
Server->>Client: response D
Client->>Server: close send
Server->>Client: close stream (status OK)
Bidirectional streaming RPC. Both client and server send messages independently on the same connection. Messages interleave freely.
Bidirectional streaming is the most powerful pattern but also the hardest to debug. Use it for: chat systems, collaborative editing, multiplayer game state sync, or any scenario where both sides produce data asynchronously.
Deadlines and cancellation
Every gRPC call should have a deadline. A deadline is an absolute point in time by which the call must complete. If the deadline passes, the call fails with a DEADLINE_EXCEEDED status. This is different from a timeout (relative duration), though in practice you often set a deadline by adding a timeout to the current time.
Why deadlines matter:
- Cascade prevention. Service A calls B, B calls C. If C is slow, without deadlines, A and B both hang. With deadlines, C’s slow response triggers
DEADLINE_EXCEEDEDin B, which propagates back to A. Resources are released. - Deadline propagation. gRPC propagates deadlines across service calls. If A sets a 500ms deadline and the call to B takes 200ms, B’s call to C has 300ms remaining. The system automatically budgets time across the call chain.
- Cancellation. When a deadline expires or a client disconnects, gRPC sends a cancellation signal to the server. Well-written servers check for cancellation and abort expensive work early.
// Go example: setting a deadline
ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
defer cancel()
resp, err := inventoryClient.CheckStock(ctx, &pb.CheckStockRequest{
ProductId: "prod_123",
WarehouseId: "wh_us_east",
})
if err != nil {
st, ok := status.FromError(err)
if ok && st.Code() == codes.DeadlineExceeded {
// Handle timeout: return cached data or fail gracefully
}
}
Set different deadlines for different call types. A stock check might get 200ms. A bulk data sync might get 30 seconds. An analytics query might get 5 seconds. Never use the same timeout for everything.
Error handling with status codes
gRPC defines a set of status codes that map cleanly to error categories:
| Code | Meaning | When to use |
|---|---|---|
OK | Success | Everything worked |
INVALID_ARGUMENT | Client sent bad data | Validation failures |
NOT_FOUND | Resource does not exist | Missing entity |
ALREADY_EXISTS | Duplicate creation | Idempotency conflict |
PERMISSION_DENIED | Caller lacks permission | Authorization failure |
UNAUTHENTICATED | No valid credentials | Missing or expired token |
RESOURCE_EXHAUSTED | Quota exceeded | Rate limiting |
UNAVAILABLE | Service is down | Transient failure, client should retry |
DEADLINE_EXCEEDED | Timeout | Call took too long |
INTERNAL | Server bug | Unexpected error |
Use these codes consistently across all your services. They enable uniform error handling in client code and meaningful monitoring dashboards.
Attach structured error details using gRPC’s error details API. Instead of encoding context in a string message, use typed error payloads:
message StockError {
string product_id = 1;
int32 requested_quantity = 2;
int32 available_quantity = 3;
}
gRPC-gateway: HTTP fallback
Not every consumer of your internal APIs can speak gRPC. Browser clients, third-party integrations, and legacy systems often need REST. gRPC-gateway is a reverse proxy that translates RESTful HTTP/JSON requests into gRPC calls.
You annotate your proto file with HTTP mappings:
import "google/api/annotations.proto";
service InventoryService {
rpc CheckStock(CheckStockRequest) returns (CheckStockResponse) {
option (google.api.http) = {
get: "/v1/inventory/{product_id}/stock"
};
}
}
The gateway generates a reverse proxy that accepts GET /v1/inventory/prod_123/stock, translates it to a CheckStock gRPC call, and returns the response as JSON.
flowchart LR Browser["Browser / REST Client"] -->|"GET /v1/inventory/prod_123/stock"| GW["gRPC-Gateway"] GW -->|"gRPC: CheckStock"| Svc["Inventory Service"] Svc -->|"gRPC response"| GW GW -->|"JSON response"| Browser Internal["Internal Service"] -->|"gRPC: CheckStock"| Svc
gRPC-gateway provides an HTTP/JSON interface for external consumers while internal services communicate directly via gRPC.
This gives you the best of both worlds: internal services get the performance of gRPC, while external consumers get a familiar REST API, all from a single service definition.
Interceptors and middleware
gRPC interceptors are the equivalent of HTTP middleware. They wrap every RPC call with cross-cutting logic:
- Logging: log every request with method name, duration, and status code.
- Metrics: emit latency histograms and error counters per method.
- Authentication: extract and validate tokens from metadata (gRPC’s equivalent of headers).
- Rate limiting: reject calls that exceed per-service or per-method limits.
- Retry: automatically retry
UNAVAILABLEerrors with backoff.
Stack interceptors in a specific order. Authentication should run first (reject unauthenticated calls before doing any work). Logging should run last (capture the final status after all processing).
Load balancing
gRPC’s use of long-lived HTTP/2 connections complicates load balancing. A traditional L4 load balancer distributes TCP connections, but since gRPC multiplexes many requests over a single connection, a new connection might carry 10% of traffic while an old connection carries 90%.
Solutions:
- L7 load balancing. Use a load balancer that understands HTTP/2 frames and distributes individual requests, not connections. Envoy, Linkerd, and most service meshes support this.
- Client-side load balancing. The gRPC client maintains a list of server addresses (from a service registry or DNS) and distributes requests across them. Simpler infrastructure, but every client must implement the logic.
- Lookaside load balancing. A separate load balancer service tells clients which server to use for each request. Google’s internal system uses this approach.
For most teams, an L7 proxy (Envoy or a service mesh sidecar) is the right answer. It handles connection management, health checking, and request distribution without requiring changes to client code.
When not to use gRPC
gRPC is the wrong choice when:
- Your consumer is a browser. gRPC-Web exists but requires a proxy and has limitations (no bidirectional streaming). For browser clients, use REST or GraphQL.
- Human readability matters. Protobuf messages are binary. You cannot curl a gRPC endpoint and read the response. During development and debugging, this friction is real.
- Your team is polyglot with poor tooling. While protobuf supports many languages, the quality of gRPC libraries varies. Go and Java have excellent support. Some languages have immature or poorly maintained libraries.
- Simple CRUD with few services. If you have three services with five endpoints each, the overhead of proto files, code generation, and gRPC infrastructure is not justified. REST is simpler.
Testing gRPC services
Testing gRPC services requires a slightly different approach than REST:
- Unit test resolvers. Test the business logic behind each RPC independently from the gRPC framework.
- Integration test with an in-process server. Start a gRPC server in your test process, connect a client to it, and make real RPC calls. This tests serialization, interceptors, and error handling.
- Contract tests with proto files. Use
buf breakingor similar tools to detect backward-incompatible schema changes in CI. This prevents accidental breaking changes from reaching production. - Load test with
ghzorgrpcurl. Measure latency and throughput under realistic load. gRPC’s binary protocol and connection reuse mean performance characteristics differ significantly from REST.
# Example: load test with ghz
ghz --insecure \
--proto inventory.proto \
--call inventory.InventoryService.CheckStock \
--data '{"product_id": "prod_123", "warehouse_id": "wh_us_east"}' \
--connections 10 \
--concurrency 100 \
--duration 30s \
localhost:50051
Migration from REST to gRPC
If you are migrating internal services from REST to gRPC, do it incrementally:
- Define protos for existing endpoints. Model your current REST request/response shapes as protobuf messages.
- Run gRPC and REST side by side. The service accepts both protocols. Route internal traffic to gRPC, keep external traffic on REST.
- Migrate consumers one at a time. Update each calling service to use the generated gRPC client. Verify behavior matches.
- Remove the REST handler once all internal consumers have migrated. Keep gRPC-gateway for any external consumers.
This approach avoids a big-bang migration and lets you validate gRPC’s performance benefits incrementally.
What comes next
This article covered gRPC for synchronous internal communication. But not every internal interaction fits request/response. Event-driven architectures, CQRS patterns, and message bus designs handle cases where services need to react to changes without tight coupling. Future articles in this series will explore those patterns.