Nov 11, 2025 · 17 min read · System Design

GraphQL server design

In this series (15 parts)

GraphQL is a query language for APIs. Instead of the server deciding what data each endpoint returns, the client sends a query describing the exact shape of data it wants. The server resolves that query by calling the appropriate data sources and returns a response that matches the query shape precisely.

This solves the over-fetching and under-fetching problems that drive teams toward the BFF pattern. But GraphQL introduces its own set of challenges: the N+1 query problem, authorization complexity, and the temptation to expose your entire data model as a public API.

For foundational API design principles, see API design. For caching strategies that interact with GraphQL’s single-endpoint model, see caching.

Schema design

The schema is your contract. It defines every type, field, and relationship that clients can query. Good schema design is the difference between a GraphQL API that scales for years and one that requires a rewrite in six months.

type Query {
  product(id: ID!): Product
  products(filter: ProductFilter, first: Int, after: String): ProductConnection!
}

type Product {
  id: ID!
  name: String!
  description: String
  price: Money!
  category: Category!
  reviews(first: Int, after: String): ReviewConnection!
  seller: User!
}

type Money {
  amount: Int!
  currency: String!
}

type ReviewConnection {
  edges: [ReviewEdge!]!
  pageInfo: PageInfo!
  totalCount: Int!
}

Design principles:

Think in graphs, not tables. Your schema should model domain relationships, not mirror database tables. A Product has reviews and a seller, expressed as graph edges, regardless of how many database tables back them.
Use connections for lists. The Relay connection spec (edges, pageInfo, cursor) is the standard pagination pattern. It supports cursor-based pagination, which is more reliable than offset-based pagination for large datasets.
Custom scalars for domain types. Use Money, DateTime, URL instead of raw Int and String. This adds type safety and documentation to the schema.
Nullable by default, non-null by intention. Mark a field as non-null (!) only when you can guarantee it will always have a value. A field that returns null during partial failure is better than one that throws an error.

Resolvers

Each field in the schema maps to a resolver function. The resolver fetches the data for that field. For scalar fields on a type, the default resolver simply reads the property from the parent object. For relationships and computed fields, you write explicit resolvers.

const resolvers = {
  Query: {
    product: (_, { id }, context) => context.dataSources.products.getById(id),
  },
  Product: {
    reviews: (product, { first, after }, context) =>
      context.dataSources.reviews.getByProductId(product.id, { first, after }),
    seller: (product, _, context) =>
      context.dataSources.users.getById(product.sellerId),
    price: (product) => ({
      amount: product.priceCents,
      currency: product.currency,
    }),
  },
};

Resolvers are composable. The Product.reviews resolver does not care how the product was fetched. It receives the product object and fetches reviews independently. This composability is powerful but creates the N+1 problem.

The N+1 problem

Consider this query:

{
  products(first: 20) {
    edges {
      node {
        name
        seller {
          name
        }
      }
    }
  }
}

The products resolver runs one query to fetch 20 products. Then for each product, the seller resolver runs a separate query to fetch the seller. That is 1 + 20 = 21 database queries for a single GraphQL request. Add reviews to the query and you get 1 + 20 + 20 = 41 queries.

flowchart TB
  subgraph Without["Without DataLoader"]
      Q1["Query: 20 products"] --> DB1["1 DB query"]
      Q1 --> S1["Seller for product 1"] --> DB2["1 DB query"]
      Q1 --> S2["Seller for product 2"] --> DB3["1 DB query"]
      Q1 --> S3["Seller for product 3"] --> DB4["1 DB query"]
      Q1 --> Sdots["... 17 more"] --> DBdots["17 DB queries"]
  end

  subgraph With["With DataLoader"]
      Q2["Query: 20 products"] --> DB5["1 DB query"]
      Q2 --> Batch["Batch: all 20 seller IDs"] --> DB6["1 DB query<br/>WHERE id IN (...)"]
  end

The N+1 problem and DataLoader’s batching solution. Without DataLoader: 21 queries. With DataLoader: 2 queries.

DataLoader

DataLoader solves N+1 by batching and caching within a single request. Instead of fetching each seller immediately, the resolver registers the seller ID with DataLoader. At the end of the current execution tick, DataLoader collects all registered IDs and makes a single batched query.

// Create a DataLoader per request
const userLoader = new DataLoader(async (userIds) => {
  const users = await db.query(
    "SELECT * FROM users WHERE id = ANY($1)",
    [userIds]
  );
  // Return in the same order as the input IDs
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id) || null);
});

// In the resolver
const resolvers = {
  Product: {
    seller: (product, _, context) =>
      context.loaders.user.load(product.sellerId),
  },
};

Critical rules for DataLoader:

Create a new instance per request. DataLoader caches results within its lifetime. Sharing across requests would return stale data.
Return results in input order. The batch function must return an array aligned with the input keys array.
Handle missing keys. Return null for IDs that do not exist, do not skip them.
Scope to one data source. Each DataLoader should correspond to one table or service. Do not try to batch across unrelated sources.

DataLoader keeps query count constant regardless of result size. Without it, query count grows linearly.

Authorization at the resolver level

GraphQL’s flexible queries mean you cannot rely on endpoint-level authorization. A single query might access products (public), order history (requires authentication), and admin analytics (requires admin role). Authorization must happen at the resolver level.

const resolvers = {
  Query: {
    product: (_, { id }, context) => {
      // Public, no auth needed
      return context.dataSources.products.getById(id);
    },
    myOrders: (_, args, context) => {
      if (!context.user) throw new AuthenticationError("Login required");
      return context.dataSources.orders.getByUserId(context.user.id, args);
    },
    adminAnalytics: (_, args, context) => {
      if (!context.user?.role === "admin")
        throw new ForbiddenError("Admin access required");
      return context.dataSources.analytics.getSummary(args);
    },
  },
  Order: {
    paymentDetails: (order, _, context) => {
      // Field-level auth: only the order owner can see payment details
      if (order.userId !== context.user?.id)
        throw new ForbiddenError("Not your order");
      return context.dataSources.payments.getByOrderId(order.id);
    },
  },
};

Patterns for authorization:

Directive-based: define custom schema directives like @auth(role: ADMIN) and apply them declaratively in the schema.
Middleware layer: wrap resolvers with an authorization function that checks permissions before executing the resolver.
Inline checks: check permissions inside each resolver. Simple but repetitive.

The directive approach is cleanest for large schemas. The schema itself documents who can access what.

Persisted queries

In production, letting clients send arbitrary query strings is both a performance and security risk. An attacker can craft deeply nested queries that consume enormous server resources. A query like { user { friends { friends { friends { ... } } } } } nested 50 levels deep can bring your server down.

Persisted queries solve this. During your build step, extract all queries your clients use and register them with the server. At runtime, clients send a query hash instead of the full query text. The server looks up the hash, finds the pre-registered query, and executes it.

Benefits:

Security: only pre-approved queries can run. No arbitrary queries from attackers.
Performance: smaller request payloads (just a hash instead of a full query string). The server can pre-plan query execution.
Caching: CDNs can cache responses keyed by the query hash via GET requests.

If you cannot use persisted queries (because your API is public and you want clients to write their own queries), implement query complexity analysis. Assign a cost to each field and reject queries that exceed a threshold.

Caching with GraphQL

GraphQL’s single endpoint (POST /graphql) makes HTTP caching harder than REST. With REST, each URL is a natural cache key. With GraphQL, every request hits the same URL with a different body.

Strategies:

Strategy	How it works	Trade-offs
Response caching	Cache full responses keyed by query hash + variables	Coarse-grained, stale data risk
Persisted queries via GET	`GET /graphql?id=abc&variables={}`	CDN-cacheable, requires persisted queries
Field-level caching	Cache individual resolver results in Redis	Fine-grained, complex invalidation
DataLoader request cache	Deduplicate within a single request	Automatic, no cross-request benefit

For most applications, field-level caching with DataLoader gives the best balance. Cache frequently accessed, slowly changing data (product details, user profiles) at the resolver level with TTLs. Let DataLoader handle within-request deduplication.

When GraphQL is the wrong choice

GraphQL is not universally superior to REST. It is the wrong choice when:

Your API is simple. If you have ten CRUD endpoints with straightforward request/response shapes, GraphQL adds schema complexity without payoff.
File uploads. GraphQL has no native file upload support. You end up bolting on multipart specs or using a separate REST endpoint anyway.
Real-time streaming. GraphQL subscriptions exist but are more complex to scale than WebSockets or SSE with REST endpoints.
Internal service-to-service calls. For backend microservice communication, gRPC is faster and provides stronger type safety with less overhead. GraphQL is designed for client-server boundaries.
Your team is small. The schema, resolver, DataLoader, and authorization layers add development overhead. For a team of two, REST with well-designed endpoints ships faster.

Each API style has a sweet spot. GraphQL excels for client-facing APIs with complex, variable data needs. REST wins for simplicity. gRPC wins for internal services.

Schema evolution

GraphQL schemas evolve differently from REST APIs. There are no URL-based versions. Instead, you evolve the schema by adding new fields and deprecating old ones.

type Product {
  id: ID!
  name: String!
  price: Money!
  # Deprecated: use priceV2 which includes tax info
  priceInCents: Int @deprecated(reason: "Use price field instead")
}

Rules for schema evolution:

Never remove a field without deprecation. Add the @deprecated directive with a reason and a migration path. Monitor usage of deprecated fields. Remove only when usage drops to zero.
Never change a field’s type. If price was an Int and you need it to be a Money object, add a new field (priceV2 or, better, price as Money with the old field renamed to priceInCents).
Additive changes are safe. Adding new fields, new types, and new enum values never breaks existing clients.
Use input types for mutations. Instead of positional arguments, use input objects. This makes adding optional fields backward-compatible.

Monitoring and observability

Track these metrics for your GraphQL server:

Query complexity distribution: identify expensive queries before they cause outages.
Resolver latency per field: find slow resolvers. A single slow resolver in a deeply nested query multiplies its impact.
Error rate by field and query: catch resolver-level failures that might not surface as top-level errors.
DataLoader batch sizes: small batches mean DataLoader is not coalescing well. Large batches might indicate a query that should be paginated.
Cache hit rate per resolver: verify that your caching strategy is effective.

What comes next

GraphQL works well at the client-server boundary. For service-to-service communication inside your backend, the next article covers gRPC and internal service APIs: protocol buffers, streaming RPCs, deadlines, and when to use gRPC-gateway for HTTP fallback.

← Back to all series