Search…

Interior point methods

In this series (18 parts)
  1. What is optimization and why ML needs it
  2. Convex sets and convex functions
  3. Optimality conditions: first order
  4. Optimality conditions: second order
  5. Line search methods
  6. Least squares: the closed-form solution
  7. Steepest descent (gradient descent)
  8. Newton's method for optimization
  9. Quasi-Newton methods: BFGS and L-BFGS
  10. Conjugate gradient methods
  11. Constrained optimization and Lagrangian duality
  12. KKT conditions
  13. Penalty and barrier methods
  14. Interior point methods
  15. The simplex method
  16. Frank-Wolfe method
  17. Optimization in dynamic programming and optimal control
  18. Stochastic gradient descent and variants

Prerequisites

This article builds directly on penalty and barrier methods. You should understand the log barrier function and how it keeps iterates in the interior of the feasible region. Familiarity with KKT conditions and Newton’s method is essential.


From barrier method to interior point method

The log barrier method from the previous article solves a sequence of unconstrained problems, each with a different barrier parameter tt. An interior point method takes this idea and makes it practical by using Newton’s method to solve each barrier subproblem, then reducing tt in a controlled way.

The result is an algorithm with a provable iteration bound: for a linear program with mm inequality constraints, you need O(m)O(\sqrt{m}) Newton steps to reach a solution with duality gap below ϵ\epsilon. That is a remarkable guarantee.


The central path

Consider a convex optimization problem:

minx  f(x)subject togi(x)0,i=1,,m\min_{x} \; f(x) \quad \text{subject to} \quad g_i(x) \le 0, \quad i = 1, \dots, m

where ff and all gig_i are convex. The barrier problem is:

minx  tf(x)+ϕ(x),ϕ(x)=i=1mln(gi(x))\min_{x} \; t \cdot f(x) + \phi(x), \quad \phi(x) = -\sum_{i=1}^{m} \ln(-g_i(x))

Here we use the convention where tt multiplies the objective (equivalent to the formulation in the previous article, just reparametrized). For each value of t>0t > 0, the barrier problem has a unique minimizer x(t)x^*(t), assuming strict feasibility.

The set of points {x(t):t>0}\{x^*(t) : t > 0\} forms the central path. It is a smooth curve that:

  • Starts deep in the interior (small tt, barrier dominates)
  • Ends at the constrained optimum (as tt \to \infty, objective dominates)
  • Passes through the “analytic center” of the feasible region when t=0t = 0
graph LR
  A["Analytic center<br/>(t small)"] -->|"Central path"| B["Near optimum<br/>(t large)"]
  B --> C["Constrained optimum<br/>(t → ∞)"]

KKT interpretation

At each point on the central path, the optimality condition gives:

tf(x(t))+ϕ(x(t))=0t \nabla f(x^*(t)) + \nabla \phi(x^*(t)) = 0 tf(x(t))+i=1m1gi(x(t))gi(x(t))=0t \nabla f(x^*(t)) + \sum_{i=1}^{m} \frac{1}{-g_i(x^*(t))} \nabla g_i(x^*(t)) = 0

Define λi(t)=1tgi(x(t))\lambda_i^*(t) = \frac{1}{-t \cdot g_i(x^*(t))}. Then the central path point satisfies:

f(x(t))+i=1mλi(t)gi(x(t))=0\nabla f(x^*(t)) + \sum_{i=1}^{m} \lambda_i^*(t) \nabla g_i(x^*(t)) = 0 λi(t)(gi(x(t)))=1t\lambda_i^*(t) \cdot (-g_i(x^*(t))) = \frac{1}{t}

Compare with the exact KKT conditions, which require λigi(x)=0\lambda_i g_i(x) = 0. The central path satisfies a “relaxed” complementary slackness: the product λi(gi)\lambda_i (-g_i) equals 1/t1/t instead of zero. As tt \to \infty, this approaches exact complementarity.

The duality gap at a central path point is:

η=mt\eta = \frac{m}{t}

where mm is the number of constraints. This gives a direct measure of suboptimality.


Example 1: Central path for a 2D linear program

Problem:

minx1,x2  x1x2\min_{x_1, x_2} \; -x_1 - x_2

subject to:

x10,x20,x1+x24x_1 \ge 0, \quad x_2 \ge 0, \quad x_1 + x_2 \le 4

The optimum is at (x1,x2)=(0,4)(x_1, x_2) = (0, 4) or (4,0)(4, 0) or anywhere on the edge x1+x2=4x_1 + x_2 = 4 with x1,x20x_1, x_2 \ge 0. Actually, since the objective is x1x2-x_1 - x_2, both contribute equally, so the optimum is at any point on the line segment from (4,0)(4, 0) to (0,4)(0, 4). The optimal value is 4-4.

Central path for the LP min -x - y subject to x ≥ 0, y ≥ 0, x + y ≤ 4, x ≤ 3. The path curves through the interior of the feasible region toward the optimal vertex (3,1).

Barrier formulation (with three constraints g1=x1g_1 = -x_1, g2=x2g_2 = -x_2, g3=x1+x24g_3 = x_1 + x_2 - 4):

minx1,x2  t(x1x2)ln(x1)ln(x2)ln(4x1x2)\min_{x_1, x_2} \; t(-x_1 - x_2) - \ln(x_1) - \ln(x_2) - \ln(4 - x_1 - x_2)

By symmetry of the objective and constraints in x1,x2x_1, x_2, the central path satisfies x1=x2x_1 = x_2. Setting x1=x2=xx_1 = x_2 = x:

B(x;t)=t(2x)2ln(x)ln(42x)B(x; t) = t(-2x) - 2\ln(x) - \ln(4 - 2x) dBdx=2t2x+242x=0\frac{dB}{dx} = -2t - \frac{2}{x} + \frac{2}{4 - 2x} = 0 2t2x+12x=0-2t - \frac{2}{x} + \frac{1}{2 - x} = 0

For t=1t = 1:

We need 22/x+1/(2x)=0-2 - 2/x + 1/(2-x) = 0. Multiply through by x(2x)x(2-x):

2x(2x)2(2x)+x=0-2x(2-x) - 2(2-x) + x = 0 4x+2x24+2x+x=0-4x + 2x^2 - 4 + 2x + x = 0 2x2x4=02x^2 - x - 4 = 0 x=1+1+324=1+3341+5.74541.686x = \frac{1 + \sqrt{1 + 32}}{4} = \frac{1 + \sqrt{33}}{4} \approx \frac{1 + 5.745}{4} \approx 1.686

Central path point: (1.686,1.686)(1.686, 1.686). Objective: 3.372-3.372. Duality gap: m/t=3/1=3m/t = 3/1 = 3.

For t=10t = 10:

202/x+1/(2x)=0-20 - 2/x + 1/(2-x) = 0. Multiply by x(2x)x(2-x):

20x(2x)2(2x)+x=0-20x(2-x) - 2(2-x) + x = 0 40x+20x24+2x+x=0-40x + 20x^2 - 4 + 2x + x = 0 20x237x4=020x^2 - 37x - 4 = 0 x=37+1369+32040=37+16894037+41.10401.953x = \frac{37 + \sqrt{1369 + 320}}{40} = \frac{37 + \sqrt{1689}}{40} \approx \frac{37 + 41.10}{40} \approx 1.953

Central path point: (1.953,1.953)(1.953, 1.953). Objective: 3.906-3.906. Duality gap: 3/10=0.33/10 = 0.3.

For t=100t = 100:

Following the same algebra (or noting the pattern), x1.995x \approx 1.995. Objective: 3.990\approx -3.990. Duality gap: 0.030.03.

The central path approaches (2,2)(2, 2) as tt \to \infty, which is the point on the optimal face closest to the analytic center.


The barrier method algorithm

Input: strictly feasible x, initial t > 0, growth factor κ > 1, tolerance ε
Repeat:
    1. Centering step: starting from x, run Newton's method to minimize
       t·f(x) + φ(x), obtaining x*(t)
    2. Update: x ← x*(t)
    3. Stopping test: if m/t < ε, return x
    4. Increase: t ← κ·t

The outer loop (steps 1 to 4) runs O(m/lnκ)O(\sqrt{m}/\ln \kappa) times. Each centering step takes a bounded number of Newton iterations (typically 5 to 15 in practice).

Choosing κ\kappa

  • Small κ\kappa (like 1.2): many outer iterations, but each centering step needs few Newton steps.
  • Large κ\kappa (like 100): few outer iterations, but each centering step needs many Newton steps.
  • Theory suggests κ=1+1/m\kappa = 1 + 1/\sqrt{m} balances these, but in practice κ\kappa between 2 and 20 works well.

Example 2: Tracing the barrier method on a small LP

Problem (in standard inequality form):

min  2x1x2s.t.x1+x24,x13,x23,x1,x20\min \; -2x_1 - x_2 \quad \text{s.t.} \quad x_1 + x_2 \le 4, \quad x_1 \le 3, \quad x_2 \le 3, \quad x_1, x_2 \ge 0

There are m=5m = 5 inequality constraints. The optimum is at (3,1)(3, 1) with objective 7-7.

Iteration 1: t=1t = 1

The barrier problem balances the objective against all five log barrier terms. Newton’s method finds the central path point. Due to symmetry-breaking from the objective (2x1-2x_1 pulls harder than x2-x_2), the solution favors larger x1x_1.

Numerical solution: x(1.85,1.45)x \approx (1.85, 1.45). Objective: 5.15\approx -5.15. Duality gap: m/t=5m/t = 5.

Iteration 2: t=10t = 10

The objective dominates more. Newton’s method (warm-started from the previous solution) converges in about 6 steps.

Numerical solution: x(2.78,1.53)x \approx (2.78, 1.53). Objective: 7.09\approx -7.09. Duality gap: 0.50.5.

Iteration 3: t=100t = 100

Now the barrier has very little influence. The solution is pushed close to the vertex.

Numerical solution: x(2.98,1.05)x \approx (2.98, 1.05). Objective: 7.01\approx -7.01. Duality gap: 0.050.05.

Iteration 4: t=1000t = 1000

Numerical solution: x(2.998,1.005)x \approx (2.998, 1.005). Objective: 7.001\approx -7.001. Duality gap: 0.0050.005.

Total Newton steps across all iterations: about 25. Compare this with the simplex method, which would find the exact vertex solution in 2 to 5 pivots for this small problem. Interior point methods shine on much larger problems.


Primal-dual interior point methods

The basic barrier method works, but primal-dual methods are what people actually use in production solvers (CPLEX, Gurobi, MOSEK). The idea: instead of solving the barrier subproblem exactly, solve the KKT system for the barrier problem approximately, updating both primal variables xx and dual variables λ\lambda simultaneously.

The modified KKT system

For the barrier problem, the KKT conditions are:

f(x)+i=1mλigi(x)=0\nabla f(x) + \sum_{i=1}^{m} \lambda_i \nabla g_i(x) = 0 λi(gi(x))=1t,i=1,,m\lambda_i (-g_i(x)) = \frac{1}{t}, \quad i = 1, \dots, m gi(x)<0,λi>0g_i(x) < 0, \quad \lambda_i > 0

In matrix form, define si=gi(x)s_i = -g_i(x) (the slacks). Then the second condition becomes λisi=1/t\lambda_i s_i = 1/t for all ii, or equivalently ΛSe=(1/t)e\Lambda S e = (1/t) e, where Λ=diag(λ)\Lambda = \text{diag}(\lambda), S=diag(s)S = \text{diag}(s), and ee is the all-ones vector.

A Newton step on this system gives the primal-dual search direction. The key advantage over the basic barrier method: you solve one linear system per iteration instead of running Newton to convergence on the barrier subproblem.

For linear programs

When f(x)=cTxf(x) = c^T x and constraints are AxbAx \le b, the system simplifies considerably. The Newton step reduces to solving:

[0ATIA000SΛ][ΔxΔλΔs]=[rdualrprimalrcent]\begin{bmatrix} 0 & A^T & I \\ A & 0 & 0 \\ 0 & S & \Lambda \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \\ \Delta s \end{bmatrix} = \begin{bmatrix} r_{\text{dual}} \\ r_{\text{primal}} \\ r_{\text{cent}} \end{bmatrix}

where rdualr_{\text{dual}}, rprimalr_{\text{primal}}, and rcentr_{\text{cent}} are the residuals for dual feasibility, primal feasibility, and centering respectively.


Example 3: One primal-dual step on a tiny LP

Problem:

min  3x12x2s.t.x1+x24,x13,x1,x20\min \; -3x_1 - 2x_2 \quad \text{s.t.} \quad x_1 + x_2 \le 4, \quad x_1 \le 3, \quad x_1, x_2 \ge 0

Written in standard form with slacks s1=4x1x2s_1 = 4 - x_1 - x_2, s2=3x1s_2 = 3 - x_1, s3=x1s_3 = x_1, s4=x2s_4 = x_2.

Current point (feasible, not optimal): x=(1,1)x = (1, 1), so s=(2,2,1,1)s = (2, 2, 1, 1).

Set λ=(0.5,0.5,0.5,0.5)\lambda = (0.5, 0.5, 0.5, 0.5) (rough initial guess). The centering parameter σ=0.5\sigma = 0.5, and μ=sTλm=0.52+0.52+0.51+0.514=34=0.75\mu = \frac{s^T \lambda}{m} = \frac{0.5 \cdot 2 + 0.5 \cdot 2 + 0.5 \cdot 1 + 0.5 \cdot 1}{4} = \frac{3}{4} = 0.75.

Target: λisi=σμ=0.375\lambda_i s_i = \sigma \mu = 0.375.

Centering residual for each constraint:

rcent,i=σμλisir_{\text{cent}, i} = \sigma \mu - \lambda_i s_i r1=0.3750.5×2=0.3751.0=0.625r_1 = 0.375 - 0.5 \times 2 = 0.375 - 1.0 = -0.625 r2=0.3750.5×2=0.625r_2 = 0.375 - 0.5 \times 2 = -0.625 r3=0.3750.5×1=0.125r_3 = 0.375 - 0.5 \times 1 = -0.125 r4=0.3750.5×1=0.125r_4 = 0.375 - 0.5 \times 1 = -0.125

Dual residual (gradient of Lagrangian):

rdual=c+ATλ=(32)+(11101001)T(0.50.50.50.5)r_{\text{dual}} = c + A^T \lambda = \begin{pmatrix} -3 \\ -2 \end{pmatrix} + \begin{pmatrix} -1 & -1 & 1 & 0 \\ -1 & 0 & 0 & 1 \end{pmatrix}^T \begin{pmatrix} 0.5 \\ 0.5 \\ 0.5 \\ 0.5 \end{pmatrix} =(32)+(0.50.5+0.5+00.5+0+0+0.5)=(30.52+0)=(3.52.0)= \begin{pmatrix} -3 \\ -2 \end{pmatrix} + \begin{pmatrix} -0.5 - 0.5 + 0.5 + 0 \\ -0.5 + 0 + 0 + 0.5 \end{pmatrix} = \begin{pmatrix} -3 - 0.5 \\ -2 + 0 \end{pmatrix} = \begin{pmatrix} -3.5 \\ -2.0 \end{pmatrix}

Wait, let me redo this carefully. The sign convention depends on whether constraints are AxbAx \le b with s=bAxs = b - Ax.

The dual residual should be cATλc - A^T \lambda where AA is the constraint matrix for Ax+s=bAx + s = b:

A=(11101001),b=(4300)A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \\ -1 & 0 \\ 0 & -1 \end{pmatrix}, \quad b = \begin{pmatrix} 4 \\ 3 \\ 0 \\ 0 \end{pmatrix}

The last two rows handle x10x_1 \ge 0 and x20x_2 \ge 0 as x10-x_1 \le 0 and x20-x_2 \le 0.

With this setup, the dual residual is nonzero, indicating the current λ\lambda is not dual feasible. The Newton system produces a direction (Δx,Δλ,Δs)(\Delta x, \Delta \lambda, \Delta s) that simultaneously improves primal optimality, dual feasibility, and centering.

After solving the linear system and taking a step with appropriate step size (ensuring s,λs, \lambda remain positive), we get a new iterate closer to the optimum (3,1)(3, 1).

The main takeaway: each primal-dual iteration solves one linear system and makes progress on all fronts simultaneously.


Self-concordant barriers

The complexity analysis of interior point methods relies on a special property of the barrier function called self-concordance. A function ff is self-concordant if:

f(x)2(f(x))3/2|f'''(x)| \le 2 \bigl(f''(x)\bigr)^{3/2}

in one dimension, with a generalization to higher dimensions. The log barrier ln(x)-\ln(x) is self-concordant, and sums of self-concordant functions are self-concordant.

Why does this matter? Self-concordance guarantees that Newton’s method converges quadratically from any point in a well-defined neighborhood of the minimizer, without needing a line search. This is what gives interior point methods their theoretical complexity bounds.

The complexity parameter ν\nu of a self-concordant barrier equals the number of constraints mm for the standard log barrier. Better barriers with smaller ν\nu exist for specific constraint structures (second-order cones, semidefinite constraints).


Complexity comparison

MethodIterations for LP with mm constraints, nn variables
SimplexWorst case exponential, average polynomial, very fast in practice
Basic barrierO(mln(m/ϵ))O(\sqrt{m} \ln(m/\epsilon)) Newton steps
Primal-dual IPO(mln(1/ϵ))O(\sqrt{m} \ln(1/\epsilon)) iterations

Each interior point iteration costs O(n2m)O(n^2 m) to O(n3)O(n^3) depending on structure. For large sparse problems (thousands of variables), interior point methods often beat simplex. For small dense problems, simplex can be faster.


Practical considerations

Preprocessing. Modern IP solvers spend significant effort on preprocessing: removing redundant constraints, fixing variables, scaling the problem. This can reduce a million-variable LP to a much smaller equivalent.

Mehrotra’s predictor-corrector. The most important practical improvement. Instead of one Newton step per iteration, take a “predictor” step (pure Newton) and a “corrector” step (centering). This roughly halves the number of outer iterations in practice.

Crossover. Interior point methods find solutions in the interior, not at vertices. If you need an exact vertex solution (for integer programming, for instance), you run a “crossover” procedure that identifies the active constraints and moves to a nearby vertex.


Python sketch: barrier method

import numpy as np

def barrier_method(c, A, b, x0, t0=1.0, kappa=10.0, tol=1e-8, max_outer=50):
    """Solve min c^T x s.t. Ax <= b using the log barrier method."""
    x = x0.copy()
    m = len(b)
    t = t0
    
    for outer in range(max_outer):
        # Newton's method for the barrier subproblem
        for _ in range(50):
            s = b - A @ x  # slacks, must be positive
            grad = t * c - A.T @ (1.0 / s)
            H = A.T @ np.diag(1.0 / s**2) @ A
            dx = np.linalg.solve(H, -grad)
            
            # Backtracking line search to stay feasible
            step = 1.0
            while np.any(b - A @ (x + step * dx) <= 0):
                step *= 0.5
            x = x + 0.9 * step * dx
            
            if np.linalg.norm(dx) < 1e-10:
                break
        
        gap = m / t
        if gap < tol:
            break
        t *= kappa
    
    return x, t * c @ x

What comes next

Interior point methods solve LPs in polynomial time, but the simplex method is the classic algorithm that started it all. Despite its exponential worst-case complexity, simplex is often the fastest method in practice for small to medium problems. We will walk through it step by step, including the tableau, pivoting, and degeneracy handling.

Start typing to search across all content
navigate Enter open Esc close