Cloud storage services
In this series (10 parts)
- Cloud fundamentals and the shared responsibility model
- Compute: VMs, containers, serverless
- Networking in the cloud
- Cloud storage services
- Managed databases in the cloud
- Cloud IAM and access control
- Serverless architecture patterns
- Cloud cost management
- Multi-cloud and cloud-agnostic design
- Cloud Well-Architected Framework
Storage is where your data lives. The cloud offers three fundamental storage types, each designed for different access patterns. Picking the right one saves money and keeps your application performant. Picking the wrong one leads to unnecessary costs or frustrating latency.
Storage types at a glance
graph LR A["Object Storage"] -->|"Files, images, backups"| S3["S3 / GCS / Blob"] B["Block Storage"] -->|"VM disks, databases"| EBS["EBS / PD / Managed Disks"] C["File Storage"] -->|"Shared filesystems"| EFS["EFS / Filestore / Azure Files"] style A fill:#3498db,color:#fff style B fill:#e74c3c,color:#fff style C fill:#2ecc71,color:#fff
Three storage types serve three different purposes. Most applications use at least two.
Object storage
Object storage is the workhorse of cloud storage. It stores files as objects in a flat namespace organized by keys (paths). Each object consists of data, metadata, and a unique identifier.
Provider implementations
| Feature | AWS S3 | GCP Cloud Storage | Azure Blob Storage |
|---|---|---|---|
| Durability | 99.999999999% (11 nines) | 99.999999999% | 99.999999999% |
| Max object size | 5 TB | 5 TB | 4.75 TB (block blob) |
| Versioning | Yes | Yes | Yes |
| Event notifications | Yes | Yes | Yes |
Eleven nines of durability means if you store 10 million objects, you can expect to lose one every 10,000 years. Object storage is designed to never lose your data.
Common use cases
- Static assets: Images, CSS, JavaScript served through a CDN.
- Data lake: Raw data for analytics pipelines.
- Backups: Database snapshots, log archives.
- Application artifacts: Build outputs, deployment packages.
Access patterns
Object storage is optimized for throughput, not latency. Reading a single object takes milliseconds. Listing thousands of objects in a bucket can be slow because there is no directory structure. The “folders” you see in the console are a UI convenience built on key prefixes.
For workloads that need to list and filter objects frequently, maintain a metadata index in a database rather than relying on list operations.
Security
Object storage buckets are a top target for data breaches. Misconfigured public access has exposed sensitive data at multiple organizations.
Best practices:
- Block all public access by default. Enable it only for specific buckets that serve static websites.
- Use bucket policies and IAM to control access.
- Enable server-side encryption. All providers support it with minimal performance impact.
- Enable access logging to track who reads and writes objects.
- Use pre-signed URLs to grant temporary access to private objects.
Storage classes and cost
Not all data needs the same level of availability. Storage classes let you trade access speed for lower cost.
Cheaper storage classes charge more for retrieval. Match the class to your access frequency.
Provider storage class mapping
| Access pattern | AWS S3 | GCP Cloud Storage | Azure Blob |
|---|---|---|---|
| Frequent | Standard | Standard | Hot |
| Infrequent | Standard-IA | Nearline | Cool |
| Archival | Glacier | Coldline | Archive |
| Deep archival | Glacier Deep Archive | Archive | Archive |
Lifecycle policies
Lifecycle policies automate transitions between storage classes. Define rules based on object age or access patterns.
{
"Rules": [
{
"Status": "Enabled",
"Transitions": [
{ "Days": 30, "StorageClass": "STANDARD_IA" },
{ "Days": 90, "StorageClass": "GLACIER" },
{ "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
],
"Expiration": { "Days": 2555 }
}
]
}
This policy moves objects to cheaper storage as they age and deletes them after 7 years. For log data, compliance archives, and backups, lifecycle policies can cut storage costs by 80% or more.
Cross-region replication
For disaster recovery and compliance, you can replicate objects across regions automatically.
graph LR Primary["Primary Bucket<br/>us-east-1"] -->|"Async replication"| DR["DR Bucket<br/>eu-west-1"] Primary -->|"Same-region replica"| SRR["Read Replica<br/>us-east-1"] style Primary fill:#3498db,color:#fff style DR fill:#e74c3c,color:#fff style SRR fill:#2ecc71,color:#fff
Cross-region replication adds durability. Same-region replication adds availability and compliance options.
Replication is asynchronous. Objects typically replicate within 15 minutes, but there is no guaranteed SLA for replication time. For critical data, verify replication metrics.
Costs to consider: you pay for storage in both regions, data transfer between regions, and API requests for the replication process.
Block storage
Block storage provides raw disk volumes that you attach to VMs. They behave like physical hard drives. Your operating system formats them with a filesystem and mounts them.
Provider implementations
| Feature | AWS EBS | GCP Persistent Disk | Azure Managed Disks |
|---|---|---|---|
| Volume types | gp3, io2, st1, sc1 | Standard, Balanced, SSD, Extreme | Standard HDD, Standard SSD, Premium SSD, Ultra |
| Max IOPS | 256,000 (io2) | 120,000 (Extreme) | 160,000 (Ultra) |
| Max throughput | 4,000 MB/s | 2,400 MB/s | 4,000 MB/s |
| Snapshots | Yes | Yes | Yes |
Choosing a volume type
- General purpose SSD (gp3, Balanced, Standard SSD): Default choice for most workloads. Good balance of price and performance.
- Provisioned IOPS (io2, Extreme, Ultra): Databases with demanding I/O requirements. Expensive.
- Throughput-optimized HDD (st1, Standard HDD): Large sequential workloads like log processing. Cheap per GB.
- Cold HDD (sc1): Infrequently accessed data. Cheapest block storage.
Snapshots
Block storage supports point-in-time snapshots. Snapshots are incremental. The first snapshot copies the entire volume. Subsequent snapshots only copy changed blocks. Store snapshots in object storage (S3 for EBS snapshots) at much lower cost than the volume itself.
Use snapshots for:
- Backup before risky operations.
- Creating new volumes from a known-good state.
- Cross-region disaster recovery by copying snapshots to another region.
File storage
File storage provides shared filesystems accessible by multiple VMs simultaneously using standard protocols (NFS, SMB).
| Feature | AWS EFS | GCP Filestore | Azure Files |
|---|---|---|---|
| Protocol | NFS v4 | NFS v3/v4 | SMB 3.0, NFS v4.1 |
| Max size | Petabytes | 100 TB | 100 TB |
| Performance modes | General, Max I/O | Basic, High Scale, Enterprise | Standard, Premium |
| Auto-scaling | Yes (elastic) | No (provisioned) | Yes |
When to use file storage
File storage is necessary when multiple compute instances need read/write access to the same files. CMS platforms, shared configuration files, machine learning training data, and legacy applications that expect a POSIX filesystem are common use cases.
If your application can be redesigned to use object storage with an API, do that instead. Object storage is cheaper, more durable, and scales better. File storage fills the gap for workloads that genuinely need shared filesystem semantics.
Choosing the right storage type
| Question | Object | Block | File |
|---|---|---|---|
| Need HTTP API access? | Yes | No | No |
| Need filesystem mount? | No | Yes | Yes |
| Shared across instances? | Via API | No (usually) | Yes |
| Best for unstructured data? | Yes | No | No |
| Best for database volumes? | No | Yes | No |
| Cost per GB (relative) | Low | Medium | High |
Cost optimization strategies
- Audit storage usage monthly: Identify stale data. Delete or archive it.
- Enable lifecycle policies on every bucket: Even a simple 90-day transition to infrequent access saves money.
- Right-size block volumes: Provisioned IOPS volumes cost more. Monitor actual IOPS and downgrade if underutilized.
- Delete old snapshots: Snapshots accumulate. Set retention policies.
- Use Intelligent-Tiering: AWS S3 Intelligent-Tiering and GCP Autoclass automatically move objects between tiers based on access patterns. Good for unpredictable workloads.
- Compress before storing: Gzip or Zstandard compression reduces storage and transfer costs.
What comes next
Data needs a home beyond raw files. The next article covers managed databases in the cloud: relational, NoSQL, caching, and search services. You will learn when to use managed offerings versus running your own.