Search…

Introduction to Infrastructure as Code

In this series (10 parts)
  1. Introduction to Infrastructure as Code
  2. Terraform fundamentals
  3. Terraform state management
  4. Terraform modules
  5. Terraform in CI/CD
  6. Ansible fundamentals
  7. Ansible roles and best practices
  8. Packer for machine images
  9. CloudFormation and CDK
  10. Managing drift and compliance

Every team that runs infrastructure in the cloud starts the same way. Someone logs into the console, clicks through a wizard, and launches a virtual machine. It works. Then the second engineer needs an identical setup. They follow a wiki page with 27 steps, miss step 14, and spend two hours debugging why the security group rules do not match production. This is the problem Infrastructure as Code solves.

Why clicking is not repeatable

Manual provisioning has three failure modes that compound over time.

  1. No audit trail. The console does not record why a change was made, only that it happened. Six months later nobody knows whether that open port was intentional.
  2. Configuration drift. Two environments that started identical diverge as engineers apply one-off fixes to staging but forget to replicate them in production.
  3. Toil scales linearly. If standing up one environment takes two hours of clicking, standing up ten takes twenty hours. Automation inverts this: the first environment takes longer to define, but every subsequent one is a single command.

IaC eliminates all three problems by treating infrastructure definitions as source code. You write files, commit them to version control, review them in pull requests, and apply them through a pipeline. The same workflow you already use for application code now governs your servers, networks, and databases.

Core principles of IaC

Declarative vs imperative

A declarative approach describes the desired end state. You say “I want three EC2 instances behind a load balancer” and the tool figures out what to create, modify, or destroy. Terraform and CloudFormation work this way.

An imperative approach describes the steps to reach that state. You say “create instance A, then create instance B, then attach both to a load balancer.” Pulumi and custom scripts work this way, though Pulumi blurs the line because its runtime still computes a desired state graph.

Declarative wins for most infrastructure work because the tool handles ordering and dependency resolution. Imperative wins when you need complex conditional logic or dynamic generation that a declarative DSL cannot express cleanly.

Idempotency

Running the same IaC definition twice should produce the same result. If the infrastructure already matches the definition, the tool should change nothing. This is idempotency. It means you can safely re-run your deployment pipeline after a network timeout without creating duplicate resources.

Terraform achieves idempotency through its state file. It compares the desired configuration against the recorded state, computes a diff, and applies only the changes. If there is no diff, there is no change.

Drift detection

Real infrastructure drifts from its definition. Someone adds a firewall rule through the console. A scaling event creates resources the IaC tool does not know about. Good IaC tooling detects this drift and gives you options: import the change into your definition, or revert the infrastructure back to the declared state.

Terraform’s terraform plan command is a drift detector. It reads the current state of every managed resource and compares it to your configuration files. Any difference shows up as a planned change.

The IaC apply lifecycle

Every IaC tool follows a similar lifecycle when you run an apply.

graph TD
  A[Write Config Files] --> B[Initialize Provider Plugins]
  B --> C[Read Current State]
  C --> D[Build Dependency Graph]
  D --> E[Compute Diff Plan]
  E --> F{Review Plan}
  F -->|Approve| G[Apply Changes]
  F -->|Reject| H[Abort]
  G --> I[Update State File]
  I --> J[Output Resource IDs]

The IaC apply lifecycle from config authoring through state update.

The plan step is critical. It shows you exactly what will be created, modified, or destroyed before any change touches real infrastructure. This is the safety net that makes IaC less risky than manual changes, not more.

Terraform in practice

Terraform uses HCL (HashiCorp Configuration Language) to define resources. Here is a complete configuration that provisions an AWS VPC with a public subnet and an EC2 instance.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  required_version = ">= 1.5.0"
}

provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = {
    Name        = "iac-demo-vpc"
    Environment = "development"
  }
}

resource "aws_subnet" "public" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "us-east-1a"
  map_public_ip_on_launch = true

  tags = {
    Name = "iac-demo-public-subnet"
  }
}

resource "aws_internet_gateway" "gw" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "iac-demo-igw"
  }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.gw.id
  }

  tags = {
    Name = "iac-demo-public-rt"
  }
}

resource "aws_route_table_association" "public" {
  subnet_id      = aws_subnet.public.id
  route_table_id = aws_route_table.public.id
}

resource "aws_instance" "web" {
  ami           = "ami-0c02fb55956c7d316"
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.public.id

  tags = {
    Name = "iac-demo-web"
  }
}

output "instance_public_ip" {
  value = aws_instance.web.public_ip
}

Terraform reads this file, resolves the dependency chain (the instance depends on the subnet, which depends on the VPC), and creates resources in the correct order. Destroying them reverses the order automatically.

Comparing IaC tools

Three tools dominate the IaC space today. Each makes different tradeoffs.

FeatureTerraformPulumiAWS CDK
LanguageHCL (domain-specific)TypeScript, Python, Go, etc.TypeScript, Python, Java, etc.
State managementState file (local or remote)Pulumi Cloud or self-managedCloudFormation stack
Cloud supportMulti-cloudMulti-cloudAWS only
Learning curveLow for simple infraFamiliar if you know the languageFamiliar if you know AWS
Drift detectionBuilt-in via planBuilt-in via previewLimited (stack drift detection)

Terraform is the default choice for teams that manage infrastructure across multiple clouds or want a clear separation between application code and infrastructure code. HCL is simple enough that operations engineers who do not write application code can still contribute.

Pulumi fits teams that want to use their existing programming language for infrastructure. If your backend is in TypeScript and you want to define infrastructure in TypeScript with loops, conditionals, and shared libraries, Pulumi removes the context switch. The tradeoff is that debugging infrastructure failures now requires understanding both the Pulumi runtime and your code.

AWS CDK is the right choice if you are fully committed to AWS and want the tightest integration with CloudFormation. CDK constructs provide high-level abstractions that bundle multiple raw resources into a single logical unit. The tradeoff is vendor lock-in.

When to use each

Pick Terraform when you need multi-cloud support, your team includes people who are not full-time developers, or you want the largest ecosystem of community modules.

Pick Pulumi when your infrastructure logic is complex enough to benefit from a real programming language, your team already knows TypeScript or Python well, and you want to share types between your application and your infrastructure.

Pick AWS CDK when you run exclusively on AWS, you want official AWS-maintained constructs, and your team is comfortable with CloudFormation as the underlying deployment engine.

There is no universal winner. The best tool is the one your team will actually maintain. IaC that nobody updates is worse than no IaC at all, because it gives false confidence that the declared state matches reality.

What comes next

This article covered why IaC matters and how the major tools compare. The rest of this series goes deeper. The next article walks through Terraform’s resource model, state management, and module system in detail. After that we cover CI/CD pipelines for infrastructure, testing strategies for IaC, and managing secrets in your configurations.

The goal by the end of the series is a complete, production-grade workflow: infrastructure defined in code, validated by tests, deployed through a pipeline, and monitored for drift.

Start typing to search across all content
navigate Enter open Esc close