Search…

Cloud storage services

In this series (10 parts)
  1. Cloud fundamentals and the shared responsibility model
  2. Compute: VMs, containers, serverless
  3. Networking in the cloud
  4. Cloud storage services
  5. Managed databases in the cloud
  6. Cloud IAM and access control
  7. Serverless architecture patterns
  8. Cloud cost management
  9. Multi-cloud and cloud-agnostic design
  10. Cloud Well-Architected Framework

Storage is where your data lives. The cloud offers three fundamental storage types, each designed for different access patterns. Picking the right one saves money and keeps your application performant. Picking the wrong one leads to unnecessary costs or frustrating latency.

Storage types at a glance

graph LR
  A["Object Storage"] -->|"Files, images, backups"| S3["S3 / GCS / Blob"]
  B["Block Storage"] -->|"VM disks, databases"| EBS["EBS / PD / Managed Disks"]
  C["File Storage"] -->|"Shared filesystems"| EFS["EFS / Filestore / Azure Files"]
  style A fill:#3498db,color:#fff
  style B fill:#e74c3c,color:#fff
  style C fill:#2ecc71,color:#fff

Three storage types serve three different purposes. Most applications use at least two.

Object storage

Object storage is the workhorse of cloud storage. It stores files as objects in a flat namespace organized by keys (paths). Each object consists of data, metadata, and a unique identifier.

Provider implementations

FeatureAWS S3GCP Cloud StorageAzure Blob Storage
Durability99.999999999% (11 nines)99.999999999%99.999999999%
Max object size5 TB5 TB4.75 TB (block blob)
VersioningYesYesYes
Event notificationsYesYesYes

Eleven nines of durability means if you store 10 million objects, you can expect to lose one every 10,000 years. Object storage is designed to never lose your data.

Common use cases

  • Static assets: Images, CSS, JavaScript served through a CDN.
  • Data lake: Raw data for analytics pipelines.
  • Backups: Database snapshots, log archives.
  • Application artifacts: Build outputs, deployment packages.

Access patterns

Object storage is optimized for throughput, not latency. Reading a single object takes milliseconds. Listing thousands of objects in a bucket can be slow because there is no directory structure. The “folders” you see in the console are a UI convenience built on key prefixes.

For workloads that need to list and filter objects frequently, maintain a metadata index in a database rather than relying on list operations.

Security

Object storage buckets are a top target for data breaches. Misconfigured public access has exposed sensitive data at multiple organizations.

Best practices:

  • Block all public access by default. Enable it only for specific buckets that serve static websites.
  • Use bucket policies and IAM to control access.
  • Enable server-side encryption. All providers support it with minimal performance impact.
  • Enable access logging to track who reads and writes objects.
  • Use pre-signed URLs to grant temporary access to private objects.

Storage classes and cost

Not all data needs the same level of availability. Storage classes let you trade access speed for lower cost.

Cheaper storage classes charge more for retrieval. Match the class to your access frequency.

Provider storage class mapping

Access patternAWS S3GCP Cloud StorageAzure Blob
FrequentStandardStandardHot
InfrequentStandard-IANearlineCool
ArchivalGlacierColdlineArchive
Deep archivalGlacier Deep ArchiveArchiveArchive

Lifecycle policies

Lifecycle policies automate transitions between storage classes. Define rules based on object age or access patterns.

{
  "Rules": [
    {
      "Status": "Enabled",
      "Transitions": [
        { "Days": 30, "StorageClass": "STANDARD_IA" },
        { "Days": 90, "StorageClass": "GLACIER" },
        { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
      ],
      "Expiration": { "Days": 2555 }
    }
  ]
}

This policy moves objects to cheaper storage as they age and deletes them after 7 years. For log data, compliance archives, and backups, lifecycle policies can cut storage costs by 80% or more.

Cross-region replication

For disaster recovery and compliance, you can replicate objects across regions automatically.

graph LR
  Primary["Primary Bucket<br/>us-east-1"] -->|"Async replication"| DR["DR Bucket<br/>eu-west-1"]
  Primary -->|"Same-region replica"| SRR["Read Replica<br/>us-east-1"]
  style Primary fill:#3498db,color:#fff
  style DR fill:#e74c3c,color:#fff
  style SRR fill:#2ecc71,color:#fff

Cross-region replication adds durability. Same-region replication adds availability and compliance options.

Replication is asynchronous. Objects typically replicate within 15 minutes, but there is no guaranteed SLA for replication time. For critical data, verify replication metrics.

Costs to consider: you pay for storage in both regions, data transfer between regions, and API requests for the replication process.

Block storage

Block storage provides raw disk volumes that you attach to VMs. They behave like physical hard drives. Your operating system formats them with a filesystem and mounts them.

Provider implementations

FeatureAWS EBSGCP Persistent DiskAzure Managed Disks
Volume typesgp3, io2, st1, sc1Standard, Balanced, SSD, ExtremeStandard HDD, Standard SSD, Premium SSD, Ultra
Max IOPS256,000 (io2)120,000 (Extreme)160,000 (Ultra)
Max throughput4,000 MB/s2,400 MB/s4,000 MB/s
SnapshotsYesYesYes

Choosing a volume type

  • General purpose SSD (gp3, Balanced, Standard SSD): Default choice for most workloads. Good balance of price and performance.
  • Provisioned IOPS (io2, Extreme, Ultra): Databases with demanding I/O requirements. Expensive.
  • Throughput-optimized HDD (st1, Standard HDD): Large sequential workloads like log processing. Cheap per GB.
  • Cold HDD (sc1): Infrequently accessed data. Cheapest block storage.

Snapshots

Block storage supports point-in-time snapshots. Snapshots are incremental. The first snapshot copies the entire volume. Subsequent snapshots only copy changed blocks. Store snapshots in object storage (S3 for EBS snapshots) at much lower cost than the volume itself.

Use snapshots for:

  • Backup before risky operations.
  • Creating new volumes from a known-good state.
  • Cross-region disaster recovery by copying snapshots to another region.

File storage

File storage provides shared filesystems accessible by multiple VMs simultaneously using standard protocols (NFS, SMB).

FeatureAWS EFSGCP FilestoreAzure Files
ProtocolNFS v4NFS v3/v4SMB 3.0, NFS v4.1
Max sizePetabytes100 TB100 TB
Performance modesGeneral, Max I/OBasic, High Scale, EnterpriseStandard, Premium
Auto-scalingYes (elastic)No (provisioned)Yes

When to use file storage

File storage is necessary when multiple compute instances need read/write access to the same files. CMS platforms, shared configuration files, machine learning training data, and legacy applications that expect a POSIX filesystem are common use cases.

If your application can be redesigned to use object storage with an API, do that instead. Object storage is cheaper, more durable, and scales better. File storage fills the gap for workloads that genuinely need shared filesystem semantics.

Choosing the right storage type

QuestionObjectBlockFile
Need HTTP API access?YesNoNo
Need filesystem mount?NoYesYes
Shared across instances?Via APINo (usually)Yes
Best for unstructured data?YesNoNo
Best for database volumes?NoYesNo
Cost per GB (relative)LowMediumHigh

Cost optimization strategies

  1. Audit storage usage monthly: Identify stale data. Delete or archive it.
  2. Enable lifecycle policies on every bucket: Even a simple 90-day transition to infrequent access saves money.
  3. Right-size block volumes: Provisioned IOPS volumes cost more. Monitor actual IOPS and downgrade if underutilized.
  4. Delete old snapshots: Snapshots accumulate. Set retention policies.
  5. Use Intelligent-Tiering: AWS S3 Intelligent-Tiering and GCP Autoclass automatically move objects between tiers based on access patterns. Good for unpredictable workloads.
  6. Compress before storing: Gzip or Zstandard compression reduces storage and transfer costs.

What comes next

Data needs a home beyond raw files. The next article covers managed databases in the cloud: relational, NoSQL, caching, and search services. You will learn when to use managed offerings versus running your own.

Start typing to search across all content
navigate Enter open Esc close