Cloud IAM and access control
In this series (10 parts)
- Cloud fundamentals and the shared responsibility model
- Compute: VMs, containers, serverless
- Networking in the cloud
- Cloud storage services
- Managed databases in the cloud
- Cloud IAM and access control
- Serverless architecture patterns
- Cloud cost management
- Multi-cloud and cloud-agnostic design
- Cloud Well-Architected Framework
A misconfigured IAM policy once gave every employee in a 3,000-person company full admin access to production databases. Nobody noticed for seven months. The breach that followed cost the company $14 million. IAM is the front door to your cloud environment. Get it wrong and nothing else matters.
Why IAM exists
Cloud providers host millions of tenants on shared infrastructure. Every API call needs an answer to two questions: who is making this request, and are they allowed to do it? IAM handles both. It authenticates callers, then evaluates policies to authorize or deny each action. Without IAM, you would need to build authentication and authorization into every single service.
Principals: users, roles, and service accounts
IAM recognizes three types of principal. Each serves a different purpose.
Users represent human beings. You create a user, attach credentials, and a person signs in. Users work for interactive access like console logins and CLI sessions. They should never be used by applications.
Roles are identity containers without permanent credentials. A principal assumes a role temporarily and receives short-lived credentials. Roles shine when you need to delegate access across boundaries. An application running on a virtual machine assumes a role. A developer in Account A assumes a role in Account B.
Service accounts are identities for workloads. In GCP they are first-class objects. In AWS the equivalent pattern uses IAM roles attached to compute resources. The key principle is the same: machines get their own identity, separate from any human.
Principal type Credentials Typical consumer
---------------------------------------------------------
User Password + MFA Human at console
Role Temporary token App, cross-account
Service account Key pair or token Background job, CI
Least privilege
Least privilege means granting only the permissions a principal actually needs and nothing more. It sounds obvious. In practice teams hand out broad policies because narrow ones take effort to define.
Start with zero permissions and add incrementally. Use access logs to see what a principal actually calls. Cloud providers offer access analyzer tools that compare granted permissions against used permissions and flag the gap. Run these reports monthly.
A common anti-pattern is granting * on all resources during development and forgetting to tighten it before shipping to production. Automate policy review in your CI pipeline to catch this before it reaches the cloud.
Identity-based vs resource-based policies
These two policy types control access from opposite directions.
Identity-based policies attach to a principal. They say “this user can read objects in bucket X.” The policy travels with the identity. When the user tries to access bucket X, the IAM engine checks the policy on the user.
Resource-based policies attach to a resource. They say “bucket X allows reads from account 123456789.” The policy lives on the resource. When anyone tries to access the bucket, the engine checks the policy on the bucket.
graph TD
A["API Request"] --> B{"Authenticate caller"}
B -->|"Valid"| C{"Gather policies"}
B -->|"Invalid"| D["DENY"]
C --> E{"Identity-based
policies"}
C --> F{"Resource-based
policies"}
E --> G{"Evaluate"}
F --> G
G -->|"Explicit deny found"| D
G -->|"No explicit deny"| H{"Any explicit allow?"}
H -->|"Yes"| I["ALLOW"]
H -->|"No"| D
style D fill:#f44,color:#fff
style I fill:#4a4,color:#fff
IAM policy evaluation logic. An explicit deny always wins. Without an explicit allow, the default is deny.
The two types combine. In most providers, an action succeeds when at least one policy allows it and no policy denies it. The exception is cross-account access, where both the identity-based and resource-based policies must allow the action.
Cross-account access
Organizations split workloads across multiple accounts for isolation. The billing account should never host application servers. The production account should not contain developer sandboxes. But services still need to communicate across these boundaries.
The standard pattern uses a role in the target account and a trust policy that allows the source account to assume it:
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::111111111111:role/deployer"
},
"Action": "sts:AssumeRole"
}
The deployer role in account 111111111111 calls AssumeRole on the target, receives temporary credentials, and operates within the target account. No long-lived keys cross account boundaries.
For GCP the equivalent is service account impersonation. For Azure it is managed identity with cross-subscription role assignments. The principle is identical: temporary credentials, explicit trust, no shared secrets.
Instance profiles and workload identity
Applications running on cloud compute need credentials. Hardcoding access keys is the fastest way to a security incident. Instance profiles and workload identity solve this by injecting credentials automatically.
On AWS, you attach an IAM role to an EC2 instance via an instance profile. The instance metadata service exposes temporary credentials at a well-known endpoint. The SDK picks them up transparently. No keys in environment variables. No keys in config files.
On GCP, you assign a service account to a VM or Cloud Run service. The metadata server provides OAuth tokens. On Azure, managed identity works the same way.
For Kubernetes clusters, workload identity federation maps Kubernetes service accounts to cloud IAM roles. A pod gets a projected service account token. The cloud provider validates it and issues cloud credentials. This avoids storing any cloud keys inside the cluster.
Pod starts
--> K8s injects projected SA token
--> Pod calls cloud API
--> Cloud validates token against OIDC provider
--> Cloud issues scoped temporary credentials
--> API call proceeds
Policy conditions
Conditions add context-aware restrictions to policies. Instead of “allow this action,” you say “allow this action only when certain conditions are met.”
Common conditions include:
- Source IP: restrict access to corporate network ranges
- MFA present: require multi-factor before sensitive operations
- Time of day: block production changes outside maintenance windows
- Resource tags: allow actions only on resources with specific tags
- Encryption context: require encrypted transport
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::sensitive-data/*",
"Condition": {
"Bool": {"aws:MultiFactorAuthPresent": "true"},
"IpAddress": {"aws:SourceIp": "10.0.0.0/8"}
}
}
This policy allows S3 reads only when the caller has authenticated with MFA and the request originates from a private network. Both conditions must be true.
Permission boundaries
Permission boundaries cap what an identity can do, even if other policies grant broader access. They act as a ceiling. A developer might have a policy that allows creating IAM roles, but a permission boundary ensures those roles never exceed a predefined set of permissions.
This solves the privilege escalation problem. Without boundaries, a user who can create roles could create a role with admin access and assume it. Boundaries prevent that because the new role inherits the boundary and cannot exceed it.
Auditing and monitoring
IAM policies mean nothing if you cannot verify they work as intended. Enable cloud audit logs for every account. These logs record every API call with the caller identity, action, resource, and result.
Set up alerts for high-risk events:
- Root account usage
- Policy changes that add
*permissions - Role assumption from unexpected accounts
- Failed authentication attempts above a threshold
Feed these logs into a SIEM or log aggregation system. Review them during incident response and periodic access reviews. The log trail is your proof that access controls actually function.
Common mistakes
Using root or owner accounts for daily work. Root accounts bypass all IAM policies. Lock them behind MFA hardware tokens and never use them for routine tasks.
Long-lived access keys. Keys that never rotate eventually leak. Prefer roles and temporary credentials everywhere. If you must use keys, rotate them automatically on a short schedule.
Overly broad wildcard policies. Action: "*" on Resource: "*" is the IAM equivalent of removing your front door. Even in development environments, scope policies to the services actually used.
Not using groups or role hierarchies. Attaching policies directly to users creates an unmanageable mess at scale. Use groups to organize humans and roles to organize machine access.
Sharing service account keys between environments. A key that works in development and production means a compromised dev environment gives an attacker production access. Each environment gets its own service account with its own credentials.
Automating IAM reviews
Manual IAM reviews do not scale. Automate them.
Cloud providers offer access analyzer tools that compare granted permissions against actual usage over a defined period. Run these weekly. They will flag roles that have not been used in 90 days, permissions that were granted but never exercised, and policies that allow more access than the principal needs.
Integrate policy validation into your CI/CD pipeline. Tools like Open Policy Agent (OPA) and provider-specific linting tools can evaluate IAM policies against your organization’s rules before they are deployed. A policy that grants admin access should fail the pipeline, not reach production.
Tag every IAM principal with an owner. When a team disbands or a project ends, you need to know which roles and service accounts to clean up. Orphaned principals with active permissions are a breach waiting to happen.
What comes next
With IAM controlling who can do what, the next article covers serverless architecture patterns where event-driven functions run without you managing servers, and IAM roles become even more critical since every function invocation carries its own identity.