Apr 1, 2026 · 14 min read · DevOps

What DevOps actually is

In this series (10 parts)

DevOps is not a tool, a job title, or a team name. It is a set of practices and a cultural philosophy that brings development and operations teams together so software moves from idea to production faster and more reliably. The word itself is a portmanteau of “development” and “operations,” coined around 2009, but the ideas behind it draw from lean manufacturing, systems thinking, and decades of pain caused by organizational silos.

The wall between dev and ops

For most of software history, developers wrote code and then handed it to a separate operations team to deploy and run. The incentives were misaligned from the start. Developers were rewarded for shipping features. Operations were rewarded for stability. Every deployment was a negotiation.

graph LR
  Dev["Development Team"] -->|"throw over the wall"| Ops["Operations Team"]
  Ops -->|"file a bug back"| Dev
  Dev -.->|"slow feedback"| Dev
  Ops -.->|"firefighting"| Ops

The traditional silo model. Dev builds, ops runs, and a handoff boundary separates them. Feedback is slow and blame flows both ways.

This model created predictable failure modes:

Long release cycles. Deployments happened monthly or quarterly because coordination overhead was enormous.
Fragile releases. Large batches of changes meant each release carried high risk.
Blame culture. When production broke, dev blamed ops for bad infrastructure. Ops blamed dev for bad code.
Knowledge silos. Nobody understood the full system end to end.

DevOps as culture, not tooling

Buying Jenkins does not make you a DevOps organization. Neither does renaming your ops team to “DevOps.” The core insight is that you need shared ownership of the entire delivery lifecycle. Developers own operational concerns like monitoring and on-call. Operations engineers participate in design decisions and code reviews.

This cultural shift produces a feedback loop instead of a handoff boundary.

graph LR
  Plan --> Code --> Build --> Test --> Release --> Deploy --> Operate --> Monitor --> Plan

The DevOps feedback loop. Every stage feeds information back to planning, compressing the cycle from months to hours.

The CALMS framework

Jez Humble and others distilled DevOps culture into five pillars known as CALMS:

Culture. Shared responsibility across dev and ops. Blameless postmortems. Psychological safety to experiment and fail.

Automation. Automate everything repeatable: builds, tests, deployments, infrastructure provisioning. Automation reduces toil and human error simultaneously.

Lean. Small batch sizes. Limit work in progress. Optimize for flow, not resource utilization. Ship small changes frequently rather than large changes infrequently.

Measurement. If you cannot measure it, you cannot improve it. Track deployment frequency, lead time, change failure rate, and mean time to recovery. These are the four DORA metrics that predict software delivery performance.

Sharing. Break down knowledge silos. Share dashboards, share runbooks, share on-call responsibility. Transparency accelerates learning.

Pillar	Question it answers	Anti-pattern it prevents
Culture	Do teams trust each other?	Blame games after incidents
Automation	Is manual toil minimized?	”Works on my machine” deployments
Lean	Are batch sizes small?	Quarterly mega-releases
Measurement	Do we know our throughput?	Gut-feel decision making
Sharing	Is knowledge accessible?	Single points of failure in people

Platform engineering vs DevOps

Platform engineering emerged because “you build it, you run it” does not scale infinitely. When every team manages its own CI/CD pipelines, Kubernetes clusters, and monitoring stacks, cognitive load explodes. Platform teams build internal developer platforms (IDPs) that provide self-service capabilities: deploy a service, provision a database, set up monitoring. The developers consume the platform. The platform team maintains it.

This is not a return to the old silo model. The key difference is that platform teams build products for internal developers. They treat developers as customers. They gather feedback, iterate on the platform, and measure adoption. The wall is gone. The abstraction is intentional.

When platform engineering makes sense:

Organizations with more than 10 product teams
When teams spend more than 30% of time on infrastructure concerns
When deployment patterns have converged enough to standardize

When it does not:

Small startups where every engineer touches everything
Early-stage organizations still discovering their deployment patterns

SRE vs DevOps

Site Reliability Engineering (SRE) originated at Google and shares most of DevOps’ goals but adds prescriptive practices. Ben Treynor Sloss described SRE as “what happens when you ask a software engineer to design an operations function.”

Key SRE concepts:

Service Level Objectives (SLOs). A target reliability expressed as a percentage. “99.9% of requests complete in under 200ms.”
Error budgets. If your SLO is 99.9%, you have a 0.1% error budget. While you have budget remaining, ship features. When the budget runs out, focus on reliability.
Toil budgets. SREs should spend no more than 50% of their time on operational toil. The rest goes to engineering work that reduces future toil.

DevOps is the philosophy. SRE is one concrete implementation of that philosophy. You can practice DevOps without SRE, and SRE without calling it DevOps, but the goals overlap heavily.

graph TB
  DevOps["DevOps (Culture + Practices)"]
  SRE["SRE (Prescriptive Implementation)"]
  PE["Platform Engineering (Internal Products)"]
  DevOps --> SRE
  DevOps --> PE
  SRE -.->|"error budgets, SLOs"| PE
  PE -.->|"self-service platforms"| SRE

DevOps is the umbrella philosophy. SRE and platform engineering are complementary implementations that often coexist in mature organizations.

Why this matters now

Cloud infrastructure made compute cheap. Containers made packaging portable. But technology alone did not fix the delivery bottleneck. Organizations that adopted DevOps culture alongside tooling saw order-of-magnitude improvements in deployment frequency and stability. The annual State of DevOps reports consistently show that elite performers deploy on demand, recover from failures in under an hour, and have change failure rates below 5%.

The tools are enablers. The culture is the multiplier.

What comes next

The next article in this series breaks down the software delivery lifecycle stage by stage and introduces the DORA metrics that measure how well your DevOps practices are actually working.

← Back to all series