Mar 15, 2026 · 20 min read · ML Optimization

Constrained optimization and Lagrangian duality

In this series (18 parts)

Prerequisites: This article assumes you are comfortable with first-order optimality conditions. You should also know how to compute gradients and understand convex functions.

Why constrained optimization matters

Most real problems have constraints. You want to minimize cost, but your budget is fixed. You want to maximize return, but risk cannot exceed a threshold. Unconstrained methods like gradient descent do not handle these situations directly. You need a framework that bakes constraints into the optimization itself.

Equality constraints and geometric intuition

Consider the basic constrained problem:

\min_x f(x) \quad \text{subject to} \quad h(x) = 0

You want to minimize $f(x)$ but only among points that satisfy $h(x) = 0$ . The constraint $h(x) = 0$ defines a surface (or curve, in 2D). You are restricted to moving along that surface.

Picture the level curves of $f$ and the constraint surface $h(x) = 0$ . As you walk along the constraint surface, $f$ decreases until you reach a point where the constraint surface is tangent to a level curve of $f$ . At that point, any small step along the constraint either increases $f$ or leaves it unchanged. You have found the optimum.

What does tangency mean mathematically? It means the gradient of $f$ is parallel to the gradient of $h$ at the optimal point. If $\nabla f$ had any component along the constraint surface, you could move in that direction and decrease $f$ further. So at the optimum:

\nabla f(x^*) = -\lambda \nabla h(x^*)

for some scalar $\lambda$ . This scalar is the Lagrange multiplier.

Contour plot of f(x,y) = x^2 + y^2 with the equality constraint x + y = 2. The constrained minimum occurs at (1,1) where the contour is tangent to the constraint line.

The Lagrangian function

Instead of solving the constrained problem directly, we build a single function that encodes both the objective and the constraint. The Lagrangian is:

L(x, \lambda) = f(x) + \lambda \, h(x)

Here $\lambda$ is the Lagrange multiplier. Setting the partial derivatives of $L$ to zero gives us the optimality conditions:

\frac{\partial L}{\partial x} = \nabla f(x) + \lambda \nabla h(x) = 0

\frac{\partial L}{\partial \lambda} = h(x) = 0

The first equation says the gradients are parallel (the geometric condition). The second equation says the constraint is satisfied. Together, they form a system of equations you can solve for $x^*$ and $\lambda^*$ .

What the multiplier means

The multiplier $\lambda^*$ has a concrete interpretation. It is the shadow price of the constraint: it tells you how much the optimal value of $f$ changes if you relax the constraint slightly. If the constraint is $h(x) = 0$ and you change it to $h(x) = \epsilon$ , then:

f(x^*_\epsilon) \approx f(x^*) - \lambda^* \epsilon

This is extremely useful in practice. It tells you the “cost” of each constraint, which constraints are worth relaxing, and by how much.

Worked example 1: minimizing distance to the origin

Problem: Minimize $f(x, y) = x^2 + y^2$ subject to $x + y = 1$ .

We want the point on the line $x + y = 1$ that is closest to the origin.

Step 1: write the Lagrangian.

The constraint is $h(x, y) = x + y - 1 = 0$ .

L(x, y, \lambda) = x^2 + y^2 + \lambda(x + y - 1)

Step 2: take partial derivatives and set them to zero.

\frac{\partial L}{\partial x} = 2x + \lambda = 0

\frac{\partial L}{\partial y} = 2y + \lambda = 0

\frac{\partial L}{\partial \lambda} = x + y - 1 = 0

Step 3: solve the system.

From the first equation: $x = -\lambda / 2$ .

From the second equation: $y = -\lambda / 2$ .

So $x = y$ . Substituting into the constraint:

x + x = 1 \implies 2x = 1 \implies x = \frac{1}{2}

Therefore $y = 1/2$ and $\lambda = -2x = -1$ .

Step 4: verify and interpret.

The optimal point is $(x^*, y^*) = (1/2, 1/2)$ with optimal value $f^* = (1/2)^2 + (1/2)^2 = 1/2$ .

The multiplier $\lambda^* = -1$ tells us: if we change the constraint to $x + y = 1 + \epsilon$ , the optimal value changes by approximately $-(-1)\epsilon = \epsilon$ . Relaxing the constraint (making the line farther from the origin) increases the minimum distance squared, which makes geometric sense.

The dual problem

The Lagrangian gives us more than just optimality conditions. It opens the door to a completely different way of solving the problem: duality.

Define the dual function:

g(\lambda) = \min_x L(x, \lambda) = \min_x \left[ f(x) + \lambda \, h(x) \right]

For each fixed $\lambda$ , you minimize the Lagrangian over $x$ with no constraints. The result $g(\lambda)$ depends only on $\lambda$ .

Why the dual matters

The dual function provides a lower bound on the optimal value of the original (primal) problem. For any $\lambda$ :

g(\lambda) \leq f(x^*)

Here is why. Let $x^*$ be the primal optimum, so $h(x^*) = 0$ . Then:

g(\lambda) = \min_x \left[ f(x) + \lambda \, h(x) \right] \leq f(x^*) + \lambda \, h(x^*) = f(x^*)

The minimum over all $x$ is at most the value at $x^*$ , and the constraint term vanishes at $x^*$ . This holds for every $\lambda$ , so we get the tightest bound by maximizing:

\max_\lambda \, g(\lambda) \leq f(x^*)

This is the dual problem: maximize $g(\lambda)$ over $\lambda$ .

Weak and strong duality

Weak duality says the dual optimal value $d^*$ is never larger than the primal optimal value $p^*$ :

d^* = \max_\lambda \, g(\lambda) \leq \min_{x: h(x)=0} f(x) = p^*

This always holds, regardless of convexity. The gap $p^* - d^*$ is called the duality gap.

Strong duality says $d^* = p^*$ , meaning the duality gap is zero. This does not always hold, but it does hold under nice conditions. For convex problems with equality constraints that are affine (linear), strong duality holds automatically. More generally, Slater’s condition guarantees strong duality: if the problem is convex and there exists a point strictly inside the feasible region (a strictly feasible point), then $d^* = p^*$ .

Why does this matter? When strong duality holds, you can solve the dual problem instead of the primal. Sometimes the dual is easier, especially when it has fewer variables or a simpler structure.

Worked example 2: duality in action

Problem: Minimize $f(x, y) = x^2 + 2y^2$ subject to $x + y = 3$ .

Primal solution

Step 1: write the Lagrangian.

L(x, y, \lambda) = x^2 + 2y^2 + \lambda(x + y - 3)

Step 2: take partial derivatives.

\frac{\partial L}{\partial x} = 2x + \lambda = 0 \implies x = -\frac{\lambda}{2}

\frac{\partial L}{\partial y} = 4y + \lambda = 0 \implies y = -\frac{\lambda}{4}

\frac{\partial L}{\partial \lambda} = x + y - 3 = 0

Step 3: solve.

Substituting $x$ and $y$ into the constraint:

-\frac{\lambda}{2} - \frac{\lambda}{4} = 3

-\frac{3\lambda}{4} = 3

\lambda^* = -4

Therefore:

x^* = -\frac{-4}{2} = 2, \quad y^* = -\frac{-4}{4} = 1

The optimal value is:

p^* = f(2, 1) = 2^2 + 2(1)^2 = 4 + 2 = 6

Dual solution

Now let’s solve the dual problem and verify strong duality.

Step 1: compute $g(\lambda)$ .

We already found that for a given $\lambda$ , the minimizers of $L$ over $(x, y)$ are $x = -\lambda/2$ and $y = -\lambda/4$ . Substituting back:

g(\lambda) = \left(-\frac{\lambda}{2}\right)^2 + 2\left(-\frac{\lambda}{4}\right)^2 + \lambda\left(-\frac{\lambda}{2} - \frac{\lambda}{4} - 3\right)

= \frac{\lambda^2}{4} + 2 \cdot \frac{\lambda^2}{16} + \lambda\left(-\frac{3\lambda}{4} - 3\right)

= \frac{\lambda^2}{4} + \frac{\lambda^2}{8} - \frac{3\lambda^2}{4} - 3\lambda

Combine the $\lambda^2$ terms with a common denominator of 8:

= \frac{2\lambda^2}{8} + \frac{\lambda^2}{8} - \frac{6\lambda^2}{8} - 3\lambda

= \frac{-3\lambda^2}{8} - 3\lambda

Step 2: maximize $g(\lambda)$ .

g'(\lambda) = -\frac{6\lambda}{8} - 3 = -\frac{3\lambda}{4} - 3

Set $g'(\lambda) = 0$ :

-\frac{3\lambda}{4} = 3 \implies \lambda^* = -4

Step 3: compute $d^*$ .

d^* = g(-4) = \frac{-3(16)}{8} - 3(-4) = -6 + 12 = 6

Verification: $d^* = 6 = p^*$ . Strong duality holds. The dual optimal equals the primal optimal.

This is a convex problem (quadratic objective, linear constraint), so strong duality is guaranteed.

Multiple equality constraints

The framework extends naturally. If you have $m$ equality constraints $h_1(x) = 0, \ldots, h_m(x) = 0$ , the Lagrangian becomes:

L(x, \lambda_1, \ldots, \lambda_m) = f(x) + \sum_{i=1}^{m} \lambda_i \, h_i(x)

The optimality conditions are:

\nabla f(x) + \sum_{i=1}^{m} \lambda_i \nabla h_i(x) = 0

h_i(x) = 0 \quad \text{for } i = 1, \ldots, m

This gives you $n + m$ equations in $n + m$ unknowns (the $n$ components of $x$ plus the $m$ multipliers). When $f$ is quadratic and the constraints are linear, the system is linear and you can solve it with standard linear algebra.

Worked example 3: two equality constraints

Problem: Minimize $f(x, y, z) = x^2 + y^2 + z^2$ subject to $x + y = 2$ and $y + z = 3$ .

We want the point closest to the origin that lies on the intersection of two planes.

Step 1: write the Lagrangian.

L = x^2 + y^2 + z^2 + \lambda_1(x + y - 2) + \lambda_2(y + z - 3)

Step 2: take partial derivatives.

\frac{\partial L}{\partial x} = 2x + \lambda_1 = 0

\frac{\partial L}{\partial y} = 2y + \lambda_1 + \lambda_2 = 0

\frac{\partial L}{\partial z} = 2z + \lambda_2 = 0

\frac{\partial L}{\partial \lambda_1} = x + y - 2 = 0

\frac{\partial L}{\partial \lambda_2} = y + z - 3 = 0

Step 3: express variables in terms of multipliers.

From the first three equations:

x = -\frac{\lambda_1}{2}, \quad y = -\frac{\lambda_1 + \lambda_2}{2}, \quad z = -\frac{\lambda_2}{2}

Step 4: substitute into the constraints.

Constraint 1 ( $x + y = 2$ ):

-\frac{\lambda_1}{2} - \frac{\lambda_1 + \lambda_2}{2} = 2

-\frac{2\lambda_1 + \lambda_2}{2} = 2

2\lambda_1 + \lambda_2 = -4 \quad \cdots (i)

Constraint 2 ( $y + z = 3$ ):

-\frac{\lambda_1 + \lambda_2}{2} - \frac{\lambda_2}{2} = 3

-\frac{\lambda_1 + 2\lambda_2}{2} = 3

\lambda_1 + 2\lambda_2 = -6 \quad \cdots (ii)

Step 5: solve the 2x2 system.

From (i): $\lambda_2 = -4 - 2\lambda_1$ .

Substitute into (ii):

\lambda_1 + 2(-4 - 2\lambda_1) = -6

\lambda_1 - 8 - 4\lambda_1 = -6

-3\lambda_1 = 2

\lambda_1 = -\frac{2}{3}

Then:

\lambda_2 = -4 - 2\left(-\frac{2}{3}\right) = -4 + \frac{4}{3} = -\frac{8}{3}

Step 6: recover the optimal point.

x^* = -\frac{-2/3}{2} = \frac{1}{3}

y^* = -\frac{-2/3 + (-8/3)}{2} = -\frac{-10/3}{2} = \frac{10}{6} = \frac{5}{3}

z^* = -\frac{-8/3}{2} = \frac{8}{6} = \frac{4}{3}

Step 7: verify.

Check constraint 1: $x^* + y^* = 1/3 + 5/3 = 6/3 = 2$ . ✓

Check constraint 2: $y^* + z^* = 5/3 + 4/3 = 9/3 = 3$ . ✓

Optimal value:

f^* = \left(\frac{1}{3}\right)^2 + \left(\frac{5}{3}\right)^2 + \left(\frac{4}{3}\right)^2 = \frac{1}{9} + \frac{25}{9} + \frac{16}{9} = \frac{42}{9} = \frac{14}{3} \approx 4.667

The multiplier $\lambda_1 = -2/3$ tells us: relaxing the first constraint ( $x + y = 2$ ) by a small $\epsilon$ changes the optimal value by approximately $(2/3)\epsilon$ . Similarly, $\lambda_2 = -8/3$ tells us the second constraint is more “expensive” to enforce.

Python implementation

Here is how you solve constrained optimization problems with scipy.optimize.minimize. We will solve Example 2: minimize $x^2 + 2y^2$ subject to $x + y = 3$ .

import numpy as np
from scipy.optimize import minimize

def objective(vars):
    x, y = vars
    return x**2 + 2 * y**2

def constraint_eq(vars):
    x, y = vars
    return x + y - 3  # must equal zero

result = minimize(
    objective,
    x0=[0.0, 0.0],
    method="SLSQP",
    constraints={"type": "eq", "fun": constraint_eq},
)

print(f"Optimal x = {result.x[0]:.4f}, y = {result.x[1]:.4f}")
print(f"Optimal value = {result.fun:.4f}")
# Output: Optimal x = 2.0000, y = 1.0000
# Output: Optimal value = 6.0000

For the two-constraint problem (Example 3):

def objective_3d(vars):
    x, y, z = vars
    return x**2 + y**2 + z**2

constraints = [
    {"type": "eq", "fun": lambda v: v[0] + v[1] - 2},
    {"type": "eq", "fun": lambda v: v[1] + v[2] - 3},
]

result = minimize(
    objective_3d,
    x0=[0.0, 0.0, 0.0],
    method="SLSQP",
    constraints=constraints,
)

print(f"Optimal (x, y, z) = ({result.x[0]:.4f}, {result.x[1]:.4f}, {result.x[2]:.4f})")
print(f"Optimal value = {result.fun:.4f}")
# Output: Optimal (x, y, z) = (0.3333, 1.6667, 1.3333)
# Output: Optimal value = 4.6667

The SLSQP method handles both equality and inequality constraints. It uses a sequential least-squares approach internally, building on ideas from least squares optimization.

A brief note on inequality constraints

So far we have only handled equality constraints of the form $h(x) = 0$ . But many real problems have inequality constraints:

\min_x f(x) \quad \text{subject to} \quad g(x) \leq 0, \quad h(x) = 0

Inequality constraints introduce additional complexity. The multipliers for inequality constraints must be non-negative, and a complementary slackness condition determines which constraints are “active” at the optimum. These ideas lead to the Karush-Kuhn-Tucker (KKT) conditions, which generalize everything in this article.

What comes next

This article covered equality-constrained optimization and duality. The natural next step is handling inequality constraints, which requires the KKT conditions. KKT conditions combine the Lagrangian framework with complementary slackness to handle both equality and inequality constraints in a unified way. They are the foundation for understanding support vector machines, interior-point methods, and most modern optimization algorithms.

← Back to all series