Constrained optimization and Lagrangian duality
In this series (18 parts)
- What is optimization and why ML needs it
- Convex sets and convex functions
- Optimality conditions: first order
- Optimality conditions: second order
- Line search methods
- Least squares: the closed-form solution
- Steepest descent (gradient descent)
- Newton's method for optimization
- Quasi-Newton methods: BFGS and L-BFGS
- Conjugate gradient methods
- Constrained optimization and Lagrangian duality
- KKT conditions
- Penalty and barrier methods
- Interior point methods
- The simplex method
- Frank-Wolfe method
- Optimization in dynamic programming and optimal control
- Stochastic gradient descent and variants
Prerequisites: This article assumes you are comfortable with first-order optimality conditions. You should also know how to compute gradients and understand convex functions.
Why constrained optimization matters
Most real problems have constraints. You want to minimize cost, but your budget is fixed. You want to maximize return, but risk cannot exceed a threshold. Unconstrained methods like gradient descent do not handle these situations directly. You need a framework that bakes constraints into the optimization itself.
Equality constraints and geometric intuition
Consider the basic constrained problem:
You want to minimize but only among points that satisfy . The constraint defines a surface (or curve, in 2D). You are restricted to moving along that surface.
Picture the level curves of and the constraint surface . As you walk along the constraint surface, decreases until you reach a point where the constraint surface is tangent to a level curve of . At that point, any small step along the constraint either increases or leaves it unchanged. You have found the optimum.
What does tangency mean mathematically? It means the gradient of is parallel to the gradient of at the optimal point. If had any component along the constraint surface, you could move in that direction and decrease further. So at the optimum:
for some scalar . This scalar is the Lagrange multiplier.
Contour plot of f(x,y) = x^2 + y^2 with the equality constraint x + y = 2. The constrained minimum occurs at (1,1) where the contour is tangent to the constraint line.
The Lagrangian function
Instead of solving the constrained problem directly, we build a single function that encodes both the objective and the constraint. The Lagrangian is:
Here is the Lagrange multiplier. Setting the partial derivatives of to zero gives us the optimality conditions:
The first equation says the gradients are parallel (the geometric condition). The second equation says the constraint is satisfied. Together, they form a system of equations you can solve for and .
What the multiplier means
The multiplier has a concrete interpretation. It is the shadow price of the constraint: it tells you how much the optimal value of changes if you relax the constraint slightly. If the constraint is and you change it to , then:
This is extremely useful in practice. It tells you the “cost” of each constraint, which constraints are worth relaxing, and by how much.
Worked example 1: minimizing distance to the origin
Problem: Minimize subject to .
We want the point on the line that is closest to the origin.
Step 1: write the Lagrangian.
The constraint is .
Step 2: take partial derivatives and set them to zero.
Step 3: solve the system.
From the first equation: .
From the second equation: .
So . Substituting into the constraint:
Therefore and .
Step 4: verify and interpret.
The optimal point is with optimal value .
The multiplier tells us: if we change the constraint to , the optimal value changes by approximately . Relaxing the constraint (making the line farther from the origin) increases the minimum distance squared, which makes geometric sense.
The dual problem
The Lagrangian gives us more than just optimality conditions. It opens the door to a completely different way of solving the problem: duality.
Define the dual function:
For each fixed , you minimize the Lagrangian over with no constraints. The result depends only on .
Why the dual matters
The dual function provides a lower bound on the optimal value of the original (primal) problem. For any :
Here is why. Let be the primal optimum, so . Then:
The minimum over all is at most the value at , and the constraint term vanishes at . This holds for every , so we get the tightest bound by maximizing:
This is the dual problem: maximize over .
Weak and strong duality
Weak duality says the dual optimal value is never larger than the primal optimal value :
This always holds, regardless of convexity. The gap is called the duality gap.
Strong duality says , meaning the duality gap is zero. This does not always hold, but it does hold under nice conditions. For convex problems with equality constraints that are affine (linear), strong duality holds automatically. More generally, Slater’s condition guarantees strong duality: if the problem is convex and there exists a point strictly inside the feasible region (a strictly feasible point), then .
Why does this matter? When strong duality holds, you can solve the dual problem instead of the primal. Sometimes the dual is easier, especially when it has fewer variables or a simpler structure.
Worked example 2: duality in action
Problem: Minimize subject to .
Primal solution
Step 1: write the Lagrangian.
Step 2: take partial derivatives.
Step 3: solve.
Substituting and into the constraint:
Therefore:
The optimal value is:
Dual solution
Now let’s solve the dual problem and verify strong duality.
Step 1: compute .
We already found that for a given , the minimizers of over are and . Substituting back:
Combine the terms with a common denominator of 8:
Step 2: maximize .
Set :
Step 3: compute .
Verification: . Strong duality holds. The dual optimal equals the primal optimal.
This is a convex problem (quadratic objective, linear constraint), so strong duality is guaranteed.
Multiple equality constraints
The framework extends naturally. If you have equality constraints , the Lagrangian becomes:
The optimality conditions are:
This gives you equations in unknowns (the components of plus the multipliers). When is quadratic and the constraints are linear, the system is linear and you can solve it with standard linear algebra.
Worked example 3: two equality constraints
Problem: Minimize subject to and .
We want the point closest to the origin that lies on the intersection of two planes.
Step 1: write the Lagrangian.
Step 2: take partial derivatives.
Step 3: express variables in terms of multipliers.
From the first three equations:
Step 4: substitute into the constraints.
Constraint 1 ():
Constraint 2 ():
Step 5: solve the 2x2 system.
From (i): .
Substitute into (ii):
Then:
Step 6: recover the optimal point.
Step 7: verify.
Check constraint 1: . ✓
Check constraint 2: . ✓
Optimal value:
The multiplier tells us: relaxing the first constraint () by a small changes the optimal value by approximately . Similarly, tells us the second constraint is more “expensive” to enforce.
Python implementation
Here is how you solve constrained optimization problems with scipy.optimize.minimize. We will solve Example 2: minimize subject to .
import numpy as np
from scipy.optimize import minimize
def objective(vars):
x, y = vars
return x**2 + 2 * y**2
def constraint_eq(vars):
x, y = vars
return x + y - 3 # must equal zero
result = minimize(
objective,
x0=[0.0, 0.0],
method="SLSQP",
constraints={"type": "eq", "fun": constraint_eq},
)
print(f"Optimal x = {result.x[0]:.4f}, y = {result.x[1]:.4f}")
print(f"Optimal value = {result.fun:.4f}")
# Output: Optimal x = 2.0000, y = 1.0000
# Output: Optimal value = 6.0000
For the two-constraint problem (Example 3):
def objective_3d(vars):
x, y, z = vars
return x**2 + y**2 + z**2
constraints = [
{"type": "eq", "fun": lambda v: v[0] + v[1] - 2},
{"type": "eq", "fun": lambda v: v[1] + v[2] - 3},
]
result = minimize(
objective_3d,
x0=[0.0, 0.0, 0.0],
method="SLSQP",
constraints=constraints,
)
print(f"Optimal (x, y, z) = ({result.x[0]:.4f}, {result.x[1]:.4f}, {result.x[2]:.4f})")
print(f"Optimal value = {result.fun:.4f}")
# Output: Optimal (x, y, z) = (0.3333, 1.6667, 1.3333)
# Output: Optimal value = 4.6667
The SLSQP method handles both equality and inequality constraints. It uses a sequential least-squares approach internally, building on ideas from least squares optimization.
A brief note on inequality constraints
So far we have only handled equality constraints of the form . But many real problems have inequality constraints:
Inequality constraints introduce additional complexity. The multipliers for inequality constraints must be non-negative, and a complementary slackness condition determines which constraints are “active” at the optimum. These ideas lead to the Karush-Kuhn-Tucker (KKT) conditions, which generalize everything in this article.
What comes next
This article covered equality-constrained optimization and duality. The natural next step is handling inequality constraints, which requires the KKT conditions. KKT conditions combine the Lagrangian framework with complementary slackness to handle both equality and inequality constraints in a unified way. They are the foundation for understanding support vector machines, interior-point methods, and most modern optimization algorithms.