Apr 14, 2026 · 17 min read · DevOps

Metrics and Prometheus

In this series (10 parts)

Metrics tell you what is happening across your entire system right now. Unlike logs, which record individual events, metrics aggregate. A counter that incremented 50,000 times in the last minute occupies a single time series point. This compression is what makes metrics cheap to store and fast to query.

Prometheus is the dominant open-source metrics system. It uses a pull model, a powerful query language called PromQL, and integrates tightly with Kubernetes and Grafana.

The scrape model

Prometheus pulls metrics from your services by making HTTP GET requests to a /metrics endpoint. Each service exposes its current metric values in a simple text format.

# HELP http_requests_total Total HTTP requests received.
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api/orders",status="200"} 142857
http_requests_total{method="POST",path="/api/orders",status="201"} 8432
http_requests_total{method="POST",path="/api/orders",status="500"} 17

# HELP http_request_duration_seconds Request duration in seconds.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.01"} 98234
http_request_duration_seconds_bucket{le="0.05"} 130000
http_request_duration_seconds_bucket{le="0.1"} 138500
http_request_duration_seconds_bucket{le="0.5"} 141900
http_request_duration_seconds_bucket{le="1.0"} 142800
http_request_duration_seconds_bucket{le="+Inf"} 142857
http_request_duration_seconds_sum 4821.3
http_request_duration_seconds_count 142857

The scrape interval (typically 15 or 30 seconds) determines resolution. Prometheus stores the scraped values as time series data points.

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "order-api"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: "true"
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)
        replacement: ${1}

Metric types

Counter

Monotonically increasing value. Resets to zero only on process restart. Use rate() or increase() to get meaningful values.

http_requests_total
errors_total
bytes_sent_total

Gauge

A value that goes up and down. Represents a current snapshot.

temperature_celsius
queue_depth
active_connections
memory_usage_bytes

Histogram

Distributes observations into configurable buckets. Essential for latency measurement because averages hide outliers.

http_request_duration_seconds_bucket{le="0.05"}  -- requests under 50ms
http_request_duration_seconds_bucket{le="0.1"}   -- requests under 100ms
http_request_duration_seconds_bucket{le="+Inf"}  -- all requests

Summary

Calculates quantiles on the client side. Less flexible than histograms because you cannot aggregate summaries across instances. Prefer histograms unless you have a specific reason to use summaries.

PromQL basics

PromQL selects and transforms time series. Start with a metric name and optional label filters:

http_requests_total{service="order-api", status=~"5.."}

This selects all time series for http_requests_total where service is order-api and status matches the regex 5.. (any 5xx status).

Rate and increase

Counters are cumulative. To get per-second request rate over the last 5 minutes:

rate(http_requests_total[5m])

To get the total increase over the last hour:

increase(http_requests_total[1h])

Aggregation

Sum the request rate across all instances of a service:

sum(rate(http_requests_total[5m])) by (service)

Get the top 5 endpoints by error rate:

topk(5,
  sum(rate(http_requests_total{status=~"5.."}[5m])) by (path)
)

Histogram quantiles

P99 latency from a histogram:

histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
)

P50 (median) latency:

histogram_quantile(0.50,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
)

The RED method

RED stands for Rate, Errors, Duration. It gives you three queries per service that answer the most important questions:

# Rate: requests per second
sum(rate(http_requests_total{service="order-api"}[5m]))

# Errors: error percentage
sum(rate(http_requests_total{service="order-api",status=~"5.."}[5m]))
/
sum(rate(http_requests_total{service="order-api"}[5m]))
* 100

# Duration: P99 latency in milliseconds
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket{service="order-api"}[5m])) by (le)
) * 1000

Every service should expose these three signals. They map directly to user experience: are requests flowing, are they succeeding, and are they fast?

Recording rules

Complex PromQL queries evaluated on every dashboard load waste CPU. Recording rules pre-compute results and store them as new time series.

# recording-rules.yml
groups:
  - name: red_metrics
    interval: 30s
    rules:
      - record: service:http_request_rate:5m
        expr: sum(rate(http_requests_total[5m])) by (service)

      - record: service:http_error_ratio:5m
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
          /
          sum(rate(http_requests_total[5m])) by (service)

      - record: service:http_latency_p99:5m
        expr: |
          histogram_quantile(0.99,
            sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
          )

Recording rules also make alerting rules simpler. Alert on service:http_error_ratio:5m > 0.01 instead of repeating the full expression.

Alerting rules

Alerting rules fire when a PromQL expression is true for a specified duration:

# alerting-rules.yml
groups:
  - name: service_alerts
    rules:
      - alert: HighErrorRate
        expr: service:http_error_ratio:5m > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate on {{ $labels.service }}"
          description: "Error rate is {{ $value | humanizePercentage }} over the last 5 minutes."
          runbook: "https://wiki.example.com/runbooks/high-error-rate"

      - alert: HighLatency
        expr: service:http_latency_p99:5m > 1.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency above 1s on {{ $labels.service }}"

The for clause prevents flapping. The alert only fires after the condition has been true continuously for the specified duration.

Federation

In large deployments, a single Prometheus server cannot scrape everything. Federation allows a global Prometheus to scrape pre-aggregated metrics from regional ones.

# global prometheus config
scrape_configs:
  - job_name: "federate-region-us"
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        - '{__name__=~"service:.+"}'
    static_configs:
      - targets: ["prometheus-us:9090"]

  - job_name: "federate-region-eu"
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        - '{__name__=~"service:.+"}'
    static_configs:
      - targets: ["prometheus-eu:9090"]

Federate only recording rules (pre-aggregated metrics), not raw metrics. This keeps the global instance lightweight.

For larger scale, consider Thanos or Cortex, which provide long-term storage, global querying, and horizontal scaling on top of Prometheus.

What comes next

Metrics are only useful when visualized. Grafana and dashboards covers how to build effective dashboards using Prometheus as a data source, including variables, alert panels, and dashboard-as-code workflows.

← Back to all series