Secrets management
In this series (10 parts)
- What DevOps actually is
- The software delivery lifecycle
- Agile, Scrum, and Kanban for DevOps teams
- Trunk-based development and branching strategies
- Environments and promotion strategies
- Configuration management
- Secrets management
- Deployment strategies
- On-call culture and incident management
- DevOps metrics and measuring maturity
A database password sitting in an environment variable is better than one hardcoded in source. But it is not good enough. That password is static. It never expires. Every developer who has ever had shell access to production has seen it. If compromised, nobody knows until something breaks.
Secrets management is the practice of storing, distributing, rotating, and auditing credentials so that a breach in one component does not cascade into a full system compromise.
Why environment variables fall short
Environment variables solved the “don’t commit passwords to Git” problem. They did not solve the bigger problems.
No access control. If a process can read one environment variable, it can read all of them. Your application reads DATABASE_URL and also has access to STRIPE_SECRET_KEY, JWT_SIGNING_KEY, and every other secret injected into the same process.
No rotation. Changing a secret in an environment variable means restarting the process. In a Kubernetes deployment that could mean rolling all pods. Teams avoid rotation because it is disruptive. Secrets become stale. Stale secrets are compromised secrets waiting to be discovered.
No audit trail. Who read the production database password? When? From which IP? Environment variables tell you nothing.
Persistence in memory and logs. Environment variables live in process memory for the entire application lifetime. They show up in debug dumps, crash reports, and sometimes in log output from poorly configured frameworks.
The secrets management model
A proper secrets manager introduces four capabilities that environment variables lack:
- Centralized encrypted storage with access policies
- Dynamic secrets that are generated on demand and expire automatically
- Automatic rotation without application restarts
- Comprehensive audit logging of every access
graph TD APP["Application"] -->|"Authenticate"| V["Secrets Manager<br/>(Vault / AWS SM)"] V -->|"Issue short-lived credential"| APP V --> ES["Encrypted Storage"] V --> AL["Audit Log"] V --> RP["Rotation Policy"] RP -->|"Rotate on schedule"| ES ADMIN["Security Team"] -->|"Define policies"| V
A secrets manager provides encrypted storage, dynamic issuance, rotation, and audit logging.
Vault concepts
HashiCorp Vault is the most widely adopted open-source secrets manager. Understanding its model gives you a framework that applies to AWS Secrets Manager, Google Secret Manager, and Azure Key Vault as well.
Authentication methods
Before Vault gives you a secret, you prove who you are. Vault supports multiple auth methods:
- Token auth: Direct token presentation. Simple but requires token management.
- AppRole: Machine-oriented auth with a role ID and secret ID. Designed for CI/CD pipelines and services.
- Kubernetes auth: Pods authenticate using their service account JWT. No secret distribution needed.
- AWS/GCP/Azure auth: Cloud workloads authenticate using instance metadata. Zero pre-shared secrets.
The key principle: the identity of the requester determines what secrets they can access. A payment service authenticates as payment-svc and gets database credentials for the payments database. It cannot read secrets belonging to the user service.
Dynamic secrets
Static secrets are stored and retrieved. Dynamic secrets are generated on demand. When your application asks Vault for database credentials, Vault creates a new database user with a unique password, grants it the minimum required permissions, and hands it back with a lease.
sequenceDiagram participant App as Application participant V as Vault participant DB as Database App->>V: Authenticate (AppRole / K8s SA) V->>V: Verify identity, check policy V->>DB: CREATE USER app_xyz WITH PASSWORD 'generated' V->>DB: GRANT SELECT, INSERT ON payments TO app_xyz V-->>App: Credentials: app_xyz / generated (TTL: 1h) Note over App: Uses credentials for 1 hour App->>V: Renew lease (before expiry) V-->>App: Lease extended (TTL: 1h) Note over App: Lease expires without renewal V->>DB: DROP USER app_xyz
Vault creates a unique database user for each requesting application, with automatic cleanup on lease expiry.
Every credential is unique to the requesting service instance. If one pod is compromised, revoke its lease. The other pods are unaffected. Compare this to a shared static password where compromise means rotating credentials for every consumer simultaneously.
Leases and TTLs
Every dynamic secret comes with a lease. The lease has a time-to-live (TTL). When the TTL expires, Vault revokes the credential automatically. Applications must renew their lease before expiry or obtain new credentials.
Short TTLs limit the blast radius of a compromised credential. A stolen password that expires in one hour is far less dangerous than one that never expires.
| TTL | Use case | Tradeoff |
|---|---|---|
| 5 minutes | CI/CD pipeline credentials | High rotation overhead, minimal exposure |
| 1 hour | Application database access | Balanced for most workloads |
| 24 hours | Long-running batch jobs | More exposure, less disruption |
| 30 days | Human developer access | Requires strong monitoring |
Policies
Vault policies are deny-by-default. You explicitly grant each identity access to specific secret paths with specific capabilities.
# Policy: payment-service
path "database/creds/payments-readonly" {
capabilities = ["read"]
}
path "secret/data/payment-service/*" {
capabilities = ["read", "list"]
}
# Explicitly denied, even though it matches a wildcard
path "secret/data/payment-service/admin-override" {
capabilities = ["deny"]
}
This is least privilege in practice. The payment service reads its own secrets and nothing else. An engineer debugging the user service cannot accidentally (or intentionally) read payment credentials.
Kubernetes secrets and their limits
Kubernetes has a built-in Secret resource. It is better than environment variables but has significant gaps.
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
type: Opaque
data:
username: cGF5bWVudHM=
password: czNjcjN0cEBzcw==
Those values are base64-encoded, not encrypted. Anyone with kubectl get secret access reads them in plain text. The etcd datastore holds them unencrypted by default.
Kubernetes secrets work as a delivery mechanism. They do not work as a secrets manager. Use them to inject secrets that a real secrets manager provides.
Hardening Kubernetes secrets:
- Enable etcd encryption at rest
- Use RBAC to restrict
getandliston Secret resources - Integrate with an external secrets operator (External Secrets Operator, Vault Agent Injector) that syncs secrets from Vault or cloud secret managers into Kubernetes Secret objects
- Audit secret access via the Kubernetes audit log
graph LR V["Vault"] -->|"Sync"| ESO["External Secrets<br/>Operator"] ESO -->|"Create/Update"| KS["Kubernetes Secret"] KS -->|"Mount as volume<br/>or env var"| POD["Application Pod"]
External Secrets Operator bridges Vault and Kubernetes, keeping secrets synchronized.
Secret rotation
Rotation is the process of replacing an active credential with a new one. The old credential is revoked after a grace period. Rotation limits how long a compromised secret remains useful.
Manual rotation is error-prone and rarely happens on schedule. Automated rotation follows a predictable pattern:
- Generate new credential
- Update the secrets manager
- Wait for consumers to pick up the new credential
- Verify all consumers are using the new credential
- Revoke the old credential
The hard part is step 3. Applications must handle credential changes gracefully. Connection pools need to reconnect. Cached tokens need to refresh. If your app reads the database password once at startup and caches it forever, rotation breaks it.
Design for rotation from the start:
- Use connection pools that reconnect on authentication failure
- Watch for secret changes (file watches, Vault lease renewal)
- Handle the “credential temporarily invalid” window with retries
- Test rotation in staging before enabling it in production
Audit trails
Every secret access should produce an audit record. Vault writes detailed audit logs:
{
"type": "response",
"auth": {
"client_token": "hmac-sha256:abc123",
"accessor": "hmac-sha256:def456",
"display_name": "kubernetes-payment-svc",
"policies": ["payment-service"]
},
"request": {
"id": "7f2a9c1b-...",
"operation": "read",
"path": "database/creds/payments-readonly",
"remote_address": "10.0.42.17"
},
"response": {
"data": {
"username": "hmac-sha256:ghi789"
}
}
}
Note that Vault HMACs sensitive fields. The audit log records that a secret was accessed without exposing the secret value itself.
Feed audit logs into your SIEM or logging pipeline. Set up alerts for:
- Access to high-value secret paths outside business hours
- Unusually high secret read rates from a single identity
- Failed authentication attempts
- Policy violations (denied access attempts)
Practical patterns
CI/CD pipeline secrets
Pipelines need credentials to deploy. Use short-lived tokens issued per pipeline run. GitHub Actions has OIDC federation that lets workflows authenticate to Vault or cloud secret managers without storing any long-lived credential.
# GitHub Actions: authenticate to Vault via OIDC
- uses: hashicorp/vault-action@v2
with:
url: https://vault.company.com
method: jwt
role: ci-deploy
secrets: |
secret/data/deploy/production api_key | DEPLOY_API_KEY
The pipeline run gets a unique token that expires when the workflow completes. No static secrets in your CI/CD configuration.
Application startup
Applications should authenticate to the secrets manager early and fail fast if authentication fails. Do not fall back to hardcoded defaults for credentials.
import hvac
client = hvac.Client(url="https://vault.internal:8200")
client.auth.kubernetes.login(role="payment-svc")
creds = client.secrets.database.generate_credentials(name="payments-readonly")
db_user = creds["data"]["username"]
db_pass = creds["data"]["password"]
lease_id = creds["lease_id"]
Emergency revocation
When a breach is detected, you need to revoke all secrets issued to the compromised identity. Vault supports revoking by lease prefix:
vault lease revoke -prefix database/creds/payments-readonly
This immediately invalidates every credential issued under that path. Applications will fail to authenticate on their next database query and must re-authenticate to Vault for fresh credentials.
Painful? Yes. But controlled pain is better than undetected compromise.
Common mistakes
Logging secrets. Middleware that logs all request headers will log Authorization tokens. Audit your logging configuration. Use secret-scanning tools in CI to catch accidental exposure.
Shared service accounts. Ten services sharing one database credential means you cannot revoke access for one without breaking nine others. Issue unique credentials per service.
Skipping rotation in staging. If you only rotate in production, production is the first place you discover rotation bugs. Test the full lifecycle in lower environments.
No break-glass procedure. When Vault is down, how do your services authenticate? Have an emergency procedure documented and tested. Sealed Vault recovery, emergency static credentials in a physical safe, or a secondary secrets manager.
What comes next
You know how to manage what changes between environments and how to protect the credentials those environments use. The next question is how to actually push new code into those environments safely. That is the domain of deployment strategies: blue-green, canary, rolling updates, and the tradeoffs each one carries.