Search…

Packer for machine images

In this series (10 parts)
  1. Introduction to Infrastructure as Code
  2. Terraform fundamentals
  3. Terraform state management
  4. Terraform modules
  5. Terraform in CI/CD
  6. Ansible fundamentals
  7. Ansible roles and best practices
  8. Packer for machine images
  9. CloudFormation and CDK
  10. Managing drift and compliance

Every time you deploy a new server, you run Ansible against it. Packages install. Configs render. Services start. The process takes eight minutes and occasionally fails halfway through because an apt mirror is down. Now multiply that by forty instances in an autoscaling group. Baking a machine image solves this. You run the configuration once, capture the result as an AMI, and every new instance launches in seconds with everything pre-installed.

Why bake images

The traditional approach provisions bare instances and configures them at boot time. This is called configuration convergence. It works, but it has weaknesses.

Boot time increases linearly with the number of packages to install. Network failures during provisioning leave instances in a broken state. Every instance independently downloads the same packages, wasting bandwidth. And you cannot easily test the exact artifact that will run in production.

Image baking flips the model. You build the image once in a controlled environment, test it, and deploy the tested artifact. Every instance that launches from that image is identical. There is no drift between instances because there is no runtime configuration step.

flowchart LR
  subgraph Convergence
      A1[Launch bare instance] --> A2[Run config management]
      A2 --> A3[Instance ready in ~8 min]
  end
  subgraph Baking
      B1[Build image with Packer] --> B2[Test image]
      B2 --> B3[Launch from image]
      B3 --> B4[Instance ready in ~30 sec]
  end

Image baking shifts configuration cost to build time. Launch time drops from minutes to seconds.

Packer template structure

Packer uses HCL2 templates (the same language as Terraform). A template has three main blocks: source, build, and optionally variables.

# variables.pkr.hcl
variable "aws_region" {
  type    = string
  default = "us-east-1"
}

variable "instance_type" {
  type    = string
  default = "t3.micro"
}

variable "app_version" {
  type    = string
  default = "1.0.0"
}

variable "ami_name_prefix" {
  type    = string
  default = "myapp"
}
# sources.pkr.hcl
source "amazon-ebs" "ubuntu" {
  ami_name      = "${var.ami_name_prefix}-${var.app_version}-{{timestamp}}"
  instance_type = var.instance_type
  region        = var.aws_region

  source_ami_filter {
    filters = {
      name                = "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"
      root-device-type    = "ebs"
      virtualization-type = "hvm"
    }
    most_recent = true
    owners      = ["099720109477"] # Canonical
  }

  ssh_username = "ubuntu"

  tags = {
    Name        = "${var.ami_name_prefix}-${var.app_version}"
    Environment = "production"
    Builder     = "packer"
    AppVersion  = var.app_version
  }
}
# build.pkr.hcl
build {
  sources = ["source.amazon-ebs.ubuntu"]

  provisioner "shell" {
    inline = [
      "sudo apt-get update -y",
      "sudo apt-get upgrade -y",
      "sudo apt-get install -y curl wget unzip jq"
    ]
  }

  provisioner "ansible" {
    playbook_file = "./ansible/configure.yml"
    extra_arguments = [
      "--extra-vars", "app_version=${var.app_version}"
    ]
  }

  provisioner "shell" {
    inline = [
      "sudo apt-get clean",
      "sudo rm -rf /var/lib/apt/lists/*",
      "sudo rm -rf /home/ubuntu/.ssh/authorized_keys"
    ]
  }

  post-processor "manifest" {
    output     = "build-manifest.json"
    strip_path = true
  }
}

Builders

Builders create the temporary instance where Packer runs provisioners. Each cloud provider has its own builder.

BuilderPurpose
amazon-ebsCreate AWS AMIs backed by EBS volumes
googlecomputeCreate GCP images
azure-armCreate Azure managed images
dockerBuild Docker images
vsphere-isoCreate VMware vSphere templates
qemuBuild images for KVM/QEMU

The amazon-ebs builder launches an EC2 instance, waits for SSH access, runs your provisioners, stops the instance, creates a snapshot, registers it as an AMI, and terminates the instance. The entire lifecycle is automated.

You can build for multiple platforms in a single template:

source "amazon-ebs" "ubuntu_east" {
  region = "us-east-1"
  # ... other config
}

source "amazon-ebs" "ubuntu_west" {
  region = "us-west-2"
  # ... other config
}

build {
  sources = [
    "source.amazon-ebs.ubuntu_east",
    "source.amazon-ebs.ubuntu_west"
  ]
  # provisioners run on both
}

Packer builds both images in parallel by default.

Provisioners

Provisioners install and configure software on the running instance. Packer supports several types.

Shell provisioner

The simplest option. Runs shell commands directly:

provisioner "shell" {
  inline = [
    "sudo apt-get update -y",
    "sudo apt-get install -y nginx"
  ]
}

For longer scripts, use an external file:

provisioner "shell" {
  script = "./scripts/install-app.sh"
  environment_vars = [
    "APP_VERSION=${var.app_version}",
    "DEPLOY_ENV=production"
  ]
}

File provisioner

Copies files from the build machine to the instance:

provisioner "file" {
  source      = "./config/app.conf"
  destination = "/tmp/app.conf"
}

provisioner "shell" {
  inline = ["sudo mv /tmp/app.conf /etc/myapp/app.conf"]
}

Ansible provisioner

Runs an Ansible playbook against the instance. This is where Packer and Ansible work together beautifully:

provisioner "ansible" {
  playbook_file   = "./ansible/site.yml"
  roles_path      = "./ansible/roles"
  galaxy_file     = "./ansible/requirements.yml"
  extra_arguments = [
    "--extra-vars", "app_version=${var.app_version}",
    "--tags", "install,configure"
  ]
}

Packer handles the SSH connection. Ansible does not need an inventory file because Packer generates one dynamically pointing at the temporary instance.

Post-processors

Post-processors run after the image is built. They transform or record the output.

Manifest

Writes build metadata to a JSON file:

post-processor "manifest" {
  output     = "build-manifest.json"
  strip_path = true
  custom_data = {
    app_version = var.app_version
    build_date  = timestamp()
  }
}

The manifest file contains the AMI ID, region, and your custom data. CI pipelines parse this file to feed the AMI ID into Terraform.

Shell local

Runs a command on the build machine after the image is complete:

post-processor "shell-local" {
  inline = [
    "echo 'AMI built successfully'",
    "aws ssm put-parameter --name /app/latest-ami --value $(jq -r '.builds[-1].artifact_id' build-manifest.json) --overwrite"
  ]
}

This example stores the new AMI ID in AWS Systems Manager Parameter Store so Terraform can reference it.

Full example: Ubuntu AMI with application

Here is a complete Packer template that builds an Ubuntu AMI with a Node.js application pre-installed:

# app-image.pkr.hcl
packer {
  required_plugins {
    amazon = {
      version = ">= 1.2.0"
      source  = "github.com/hashicorp/amazon"
    }
    ansible = {
      version = ">= 1.1.0"
      source  = "github.com/hashicorp/ansible"
    }
  }
}

variable "app_version" {
  type = string
}

variable "aws_region" {
  type    = string
  default = "us-east-1"
}

source "amazon-ebs" "app" {
  ami_name      = "myapp-${var.app_version}-{{timestamp}}"
  instance_type = "t3.small"
  region        = var.aws_region

  source_ami_filter {
    filters = {
      name                = "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"
      root-device-type    = "ebs"
      virtualization-type = "hvm"
    }
    most_recent = true
    owners      = ["099720109477"]
  }

  ssh_username = "ubuntu"

  tags = {
    Name       = "myapp-${var.app_version}"
    AppVersion = var.app_version
    Builder    = "packer"
  }
}

build {
  sources = ["source.amazon-ebs.app"]

  provisioner "shell" {
    inline = [
      "sudo apt-get update -y",
      "sudo apt-get install -y curl gnupg2",
      "curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -",
      "sudo apt-get install -y nodejs",
      "sudo npm install -g pm2",
      "sudo mkdir -p /opt/myapp",
      "sudo chown ubuntu:ubuntu /opt/myapp"
    ]
  }

  provisioner "file" {
    source      = "./dist/"
    destination = "/opt/myapp/"
  }

  provisioner "file" {
    source      = "./config/ecosystem.config.js"
    destination = "/opt/myapp/ecosystem.config.js"
  }

  provisioner "shell" {
    inline = [
      "cd /opt/myapp && npm install --production",
      "sudo env PATH=$PATH:/usr/bin pm2 startup systemd -u ubuntu --hp /home/ubuntu",
      "pm2 start /opt/myapp/ecosystem.config.js",
      "pm2 save"
    ]
  }

  provisioner "shell" {
    inline = [
      "sudo apt-get clean",
      "sudo rm -rf /var/lib/apt/lists/*",
      "sudo rm -rf /home/ubuntu/.ssh/authorized_keys",
      "sudo rm -rf /root/.ssh/authorized_keys"
    ]
  }

  post-processor "manifest" {
    output     = "build-manifest.json"
    strip_path = true
    custom_data = {
      app_version = var.app_version
    }
  }
}

Build the image:

packer init app-image.pkr.hcl
packer validate -var "app_version=1.2.0" app-image.pkr.hcl
packer build -var "app_version=1.2.0" app-image.pkr.hcl

Integrating with CI

Packer fits naturally into a CI/CD pipeline. The typical flow is: build code, run tests, bake image, deploy image.

flowchart LR
  A[Push code] --> B[CI: build + test]
  B --> C[CI: packer build]
  C --> D[Store AMI ID]
  D --> E[CI: terraform apply]
  E --> F[Rolling deploy with new AMI]

Packer runs after tests pass. The resulting AMI ID feeds into Terraform for deployment.

A GitHub Actions workflow for this pipeline:

# .github/workflows/build-deploy.yml
name: Build and Deploy
on:
  push:
    branches: [main]

jobs:
  build-image:
    runs-on: ubuntu-latest
    outputs:
      ami_id: ${{ steps.extract.outputs.ami_id }}
    steps:
      - uses: actions/checkout@v4

      - name: Build application
        run: npm ci && npm run build

      - name: Run tests
        run: npm test

      - name: Setup Packer
        uses: hashicorp/setup-packer@main

      - name: Initialize Packer
        run: packer init app-image.pkr.hcl

      - name: Validate template
        run: packer validate -var "app_version=${{ github.sha }}" app-image.pkr.hcl

      - name: Build AMI
        run: packer build -var "app_version=${{ github.sha }}" app-image.pkr.hcl
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

      - name: Extract AMI ID
        id: extract
        run: |
          AMI_ID=$(jq -r '.builds[-1].artifact_id | split(":")[1]' build-manifest.json)
          echo "ami_id=$AMI_ID" >> "$GITHUB_OUTPUT"

  deploy:
    needs: build-image
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3

      - name: Deploy with new AMI
        run: |
          cd terraform
          terraform init
          terraform apply -auto-approve -var "ami_id=${{ needs.build-image.outputs.ami_id }}"
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

Image lifecycle management

AMIs accumulate. Each build creates a new one. Old images waste storage and clutter the console. Automate cleanup by tagging images with build dates and running a deregistration script:

# Deregister AMIs older than 30 days, keep at least 3
aws ec2 describe-images \
  --owners self \
  --filters "Name=tag:Builder,Values=packer" \
  --query 'Images | sort_by(@, &CreationDate) | [:-3]' \
  --output json | \
jq -r '.[] | select(.CreationDate < "'$(date -d '30 days ago' -Iseconds)'") | .ImageId' | \
while read ami_id; do
  echo "Deregistering $ami_id"
  aws ec2 deregister-image --image-id "$ami_id"
done

What comes next

You can now bake tested, immutable machine images and deploy them through CI. The next article covers CloudFormation and CDK, where you will learn AWS-native infrastructure as code with the CloudFormation stack model, change sets, drift detection, and the Cloud Development Kit that lets you define infrastructure in TypeScript instead of YAML.

Start typing to search across all content
navigate Enter open Esc close