Interior point methods
In this series (18 parts)
- What is optimization and why ML needs it
- Convex sets and convex functions
- Optimality conditions: first order
- Optimality conditions: second order
- Line search methods
- Least squares: the closed-form solution
- Steepest descent (gradient descent)
- Newton's method for optimization
- Quasi-Newton methods: BFGS and L-BFGS
- Conjugate gradient methods
- Constrained optimization and Lagrangian duality
- KKT conditions
- Penalty and barrier methods
- Interior point methods
- The simplex method
- Frank-Wolfe method
- Optimization in dynamic programming and optimal control
- Stochastic gradient descent and variants
Prerequisites
This article builds directly on penalty and barrier methods. You should understand the log barrier function and how it keeps iterates in the interior of the feasible region. Familiarity with KKT conditions and Newton’s method is essential.
From barrier method to interior point method
The log barrier method from the previous article solves a sequence of unconstrained problems, each with a different barrier parameter . An interior point method takes this idea and makes it practical by using Newton’s method to solve each barrier subproblem, then reducing in a controlled way.
The result is an algorithm with a provable iteration bound: for a linear program with inequality constraints, you need Newton steps to reach a solution with duality gap below . That is a remarkable guarantee.
The central path
Consider a convex optimization problem:
where and all are convex. The barrier problem is:
Here we use the convention where multiplies the objective (equivalent to the formulation in the previous article, just reparametrized). For each value of , the barrier problem has a unique minimizer , assuming strict feasibility.
The set of points forms the central path. It is a smooth curve that:
- Starts deep in the interior (small , barrier dominates)
- Ends at the constrained optimum (as , objective dominates)
- Passes through the “analytic center” of the feasible region when
graph LR A["Analytic center<br/>(t small)"] -->|"Central path"| B["Near optimum<br/>(t large)"] B --> C["Constrained optimum<br/>(t → ∞)"]
KKT interpretation
At each point on the central path, the optimality condition gives:
Define . Then the central path point satisfies:
Compare with the exact KKT conditions, which require . The central path satisfies a “relaxed” complementary slackness: the product equals instead of zero. As , this approaches exact complementarity.
The duality gap at a central path point is:
where is the number of constraints. This gives a direct measure of suboptimality.
Example 1: Central path for a 2D linear program
Problem:
subject to:
The optimum is at or or anywhere on the edge with . Actually, since the objective is , both contribute equally, so the optimum is at any point on the line segment from to . The optimal value is .
Central path for the LP min -x - y subject to x ≥ 0, y ≥ 0, x + y ≤ 4, x ≤ 3. The path curves through the interior of the feasible region toward the optimal vertex (3,1).
Barrier formulation (with three constraints , , ):
By symmetry of the objective and constraints in , the central path satisfies . Setting :
For :
We need . Multiply through by :
Central path point: . Objective: . Duality gap: .
For :
. Multiply by :
Central path point: . Objective: . Duality gap: .
For :
Following the same algebra (or noting the pattern), . Objective: . Duality gap: .
The central path approaches as , which is the point on the optimal face closest to the analytic center.
The barrier method algorithm
Input: strictly feasible x, initial t > 0, growth factor κ > 1, tolerance ε
Repeat:
1. Centering step: starting from x, run Newton's method to minimize
t·f(x) + φ(x), obtaining x*(t)
2. Update: x ← x*(t)
3. Stopping test: if m/t < ε, return x
4. Increase: t ← κ·t
The outer loop (steps 1 to 4) runs times. Each centering step takes a bounded number of Newton iterations (typically 5 to 15 in practice).
Choosing
- Small (like 1.2): many outer iterations, but each centering step needs few Newton steps.
- Large (like 100): few outer iterations, but each centering step needs many Newton steps.
- Theory suggests balances these, but in practice between 2 and 20 works well.
Example 2: Tracing the barrier method on a small LP
Problem (in standard inequality form):
There are inequality constraints. The optimum is at with objective .
Iteration 1:
The barrier problem balances the objective against all five log barrier terms. Newton’s method finds the central path point. Due to symmetry-breaking from the objective ( pulls harder than ), the solution favors larger .
Numerical solution: . Objective: . Duality gap: .
Iteration 2:
The objective dominates more. Newton’s method (warm-started from the previous solution) converges in about 6 steps.
Numerical solution: . Objective: . Duality gap: .
Iteration 3:
Now the barrier has very little influence. The solution is pushed close to the vertex.
Numerical solution: . Objective: . Duality gap: .
Iteration 4:
Numerical solution: . Objective: . Duality gap: .
Total Newton steps across all iterations: about 25. Compare this with the simplex method, which would find the exact vertex solution in 2 to 5 pivots for this small problem. Interior point methods shine on much larger problems.
Primal-dual interior point methods
The basic barrier method works, but primal-dual methods are what people actually use in production solvers (CPLEX, Gurobi, MOSEK). The idea: instead of solving the barrier subproblem exactly, solve the KKT system for the barrier problem approximately, updating both primal variables and dual variables simultaneously.
The modified KKT system
For the barrier problem, the KKT conditions are:
In matrix form, define (the slacks). Then the second condition becomes for all , or equivalently , where , , and is the all-ones vector.
A Newton step on this system gives the primal-dual search direction. The key advantage over the basic barrier method: you solve one linear system per iteration instead of running Newton to convergence on the barrier subproblem.
For linear programs
When and constraints are , the system simplifies considerably. The Newton step reduces to solving:
where , , and are the residuals for dual feasibility, primal feasibility, and centering respectively.
Example 3: One primal-dual step on a tiny LP
Problem:
Written in standard form with slacks , , , .
Current point (feasible, not optimal): , so .
Set (rough initial guess). The centering parameter , and .
Target: .
Centering residual for each constraint:
Dual residual (gradient of Lagrangian):
Wait, let me redo this carefully. The sign convention depends on whether constraints are with .
The dual residual should be where is the constraint matrix for :
The last two rows handle and as and .
With this setup, the dual residual is nonzero, indicating the current is not dual feasible. The Newton system produces a direction that simultaneously improves primal optimality, dual feasibility, and centering.
After solving the linear system and taking a step with appropriate step size (ensuring remain positive), we get a new iterate closer to the optimum .
The main takeaway: each primal-dual iteration solves one linear system and makes progress on all fronts simultaneously.
Self-concordant barriers
The complexity analysis of interior point methods relies on a special property of the barrier function called self-concordance. A function is self-concordant if:
in one dimension, with a generalization to higher dimensions. The log barrier is self-concordant, and sums of self-concordant functions are self-concordant.
Why does this matter? Self-concordance guarantees that Newton’s method converges quadratically from any point in a well-defined neighborhood of the minimizer, without needing a line search. This is what gives interior point methods their theoretical complexity bounds.
The complexity parameter of a self-concordant barrier equals the number of constraints for the standard log barrier. Better barriers with smaller exist for specific constraint structures (second-order cones, semidefinite constraints).
Complexity comparison
| Method | Iterations for LP with constraints, variables |
|---|---|
| Simplex | Worst case exponential, average polynomial, very fast in practice |
| Basic barrier | Newton steps |
| Primal-dual IP | iterations |
Each interior point iteration costs to depending on structure. For large sparse problems (thousands of variables), interior point methods often beat simplex. For small dense problems, simplex can be faster.
Practical considerations
Preprocessing. Modern IP solvers spend significant effort on preprocessing: removing redundant constraints, fixing variables, scaling the problem. This can reduce a million-variable LP to a much smaller equivalent.
Mehrotra’s predictor-corrector. The most important practical improvement. Instead of one Newton step per iteration, take a “predictor” step (pure Newton) and a “corrector” step (centering). This roughly halves the number of outer iterations in practice.
Crossover. Interior point methods find solutions in the interior, not at vertices. If you need an exact vertex solution (for integer programming, for instance), you run a “crossover” procedure that identifies the active constraints and moves to a nearby vertex.
Python sketch: barrier method
import numpy as np
def barrier_method(c, A, b, x0, t0=1.0, kappa=10.0, tol=1e-8, max_outer=50):
"""Solve min c^T x s.t. Ax <= b using the log barrier method."""
x = x0.copy()
m = len(b)
t = t0
for outer in range(max_outer):
# Newton's method for the barrier subproblem
for _ in range(50):
s = b - A @ x # slacks, must be positive
grad = t * c - A.T @ (1.0 / s)
H = A.T @ np.diag(1.0 / s**2) @ A
dx = np.linalg.solve(H, -grad)
# Backtracking line search to stay feasible
step = 1.0
while np.any(b - A @ (x + step * dx) <= 0):
step *= 0.5
x = x + 0.9 * step * dx
if np.linalg.norm(dx) < 1e-10:
break
gap = m / t
if gap < tol:
break
t *= kappa
return x, t * c @ x
What comes next
Interior point methods solve LPs in polynomial time, but the simplex method is the classic algorithm that started it all. Despite its exponential worst-case complexity, simplex is often the fastest method in practice for small to medium problems. We will walk through it step by step, including the tableau, pivoting, and degeneracy handling.