Mar 18, 2026 · 18 min read · ML Optimization

Interior point methods

In this series (18 parts)

Prerequisites

This article builds directly on penalty and barrier methods. You should understand the log barrier function and how it keeps iterates in the interior of the feasible region. Familiarity with KKT conditions and Newton’s method is essential.

From barrier method to interior point method

The log barrier method from the previous article solves a sequence of unconstrained problems, each with a different barrier parameter $t$ . An interior point method takes this idea and makes it practical by using Newton’s method to solve each barrier subproblem, then reducing $t$ in a controlled way.

The result is an algorithm with a provable iteration bound: for a linear program with $m$ inequality constraints, you need $O(\sqrt{m})$ Newton steps to reach a solution with duality gap below $\epsilon$ . That is a remarkable guarantee.

The central path

Consider a convex optimization problem:

\min_{x} \; f(x) \quad \text{subject to} \quad g_i(x) \le 0, \quad i = 1, \dots, m

where $f$ and all $g_i$ are convex. The barrier problem is:

\min_{x} \; t \cdot f(x) + \phi(x), \quad \phi(x) = -\sum_{i=1}^{m} \ln(-g_i(x))

Here we use the convention where $t$ multiplies the objective (equivalent to the formulation in the previous article, just reparametrized). For each value of $t > 0$ , the barrier problem has a unique minimizer $x^*(t)$ , assuming strict feasibility.

The set of points $\{x^*(t) : t > 0\}$ forms the central path. It is a smooth curve that:

Starts deep in the interior (small $t$ , barrier dominates)
Ends at the constrained optimum (as $t \to \infty$ , objective dominates)
Passes through the “analytic center” of the feasible region when $t = 0$

graph LR
  A["Analytic center<br/>(t small)"] -->|"Central path"| B["Near optimum<br/>(t large)"]
  B --> C["Constrained optimum<br/>(t → ∞)"]

KKT interpretation

At each point on the central path, the optimality condition gives:

t \nabla f(x^*(t)) + \nabla \phi(x^*(t)) = 0

t \nabla f(x^*(t)) + \sum_{i=1}^{m} \frac{1}{-g_i(x^*(t))} \nabla g_i(x^*(t)) = 0

Define $\lambda_i^*(t) = \frac{1}{-t \cdot g_i(x^*(t))}$ . Then the central path point satisfies:

\nabla f(x^*(t)) + \sum_{i=1}^{m} \lambda_i^*(t) \nabla g_i(x^*(t)) = 0

\lambda_i^*(t) \cdot (-g_i(x^*(t))) = \frac{1}{t}

Compare with the exact KKT conditions, which require $\lambda_i g_i(x) = 0$ . The central path satisfies a “relaxed” complementary slackness: the product $\lambda_i (-g_i)$ equals $1/t$ instead of zero. As $t \to \infty$ , this approaches exact complementarity.

The duality gap at a central path point is:

\eta = \frac{m}{t}

where $m$ is the number of constraints. This gives a direct measure of suboptimality.

Example 1: Central path for a 2D linear program

Problem:

\min_{x_1, x_2} \; -x_1 - x_2

subject to:

x_1 \ge 0, \quad x_2 \ge 0, \quad x_1 + x_2 \le 4

The optimum is at $(x_1, x_2) = (0, 4)$ or $(4, 0)$ or anywhere on the edge $x_1 + x_2 = 4$ with $x_1, x_2 \ge 0$ . Actually, since the objective is $-x_1 - x_2$ , both contribute equally, so the optimum is at any point on the line segment from $(4, 0)$ to $(0, 4)$ . The optimal value is $-4$ .

Central path for the LP min -x - y subject to x ≥ 0, y ≥ 0, x + y ≤ 4, x ≤ 3. The path curves through the interior of the feasible region toward the optimal vertex (3,1).

Barrier formulation (with three constraints $g_1 = -x_1$ , $g_2 = -x_2$ , $g_3 = x_1 + x_2 - 4$ ):

\min_{x_1, x_2} \; t(-x_1 - x_2) - \ln(x_1) - \ln(x_2) - \ln(4 - x_1 - x_2)

By symmetry of the objective and constraints in $x_1, x_2$ , the central path satisfies $x_1 = x_2$ . Setting $x_1 = x_2 = x$ :

B(x; t) = t(-2x) - 2\ln(x) - \ln(4 - 2x)

\frac{dB}{dx} = -2t - \frac{2}{x} + \frac{2}{4 - 2x} = 0

-2t - \frac{2}{x} + \frac{1}{2 - x} = 0

For $t = 1$ :

We need $-2 - 2/x + 1/(2-x) = 0$ . Multiply through by $x(2-x)$ :

-2x(2-x) - 2(2-x) + x = 0

-4x + 2x^2 - 4 + 2x + x = 0

2x^2 - x - 4 = 0

x = \frac{1 + \sqrt{1 + 32}}{4} = \frac{1 + \sqrt{33}}{4} \approx \frac{1 + 5.745}{4} \approx 1.686

Central path point: $(1.686, 1.686)$ . Objective: $-3.372$ . Duality gap: $m/t = 3/1 = 3$ .

For $t = 10$ :

$-20 - 2/x + 1/(2-x) = 0$ . Multiply by $x(2-x)$ :

-20x(2-x) - 2(2-x) + x = 0

-40x + 20x^2 - 4 + 2x + x = 0

20x^2 - 37x - 4 = 0

x = \frac{37 + \sqrt{1369 + 320}}{40} = \frac{37 + \sqrt{1689}}{40} \approx \frac{37 + 41.10}{40} \approx 1.953

Central path point: $(1.953, 1.953)$ . Objective: $-3.906$ . Duality gap: $3/10 = 0.3$ .

For $t = 100$ :

Following the same algebra (or noting the pattern), $x \approx 1.995$ . Objective: $\approx -3.990$ . Duality gap: $0.03$ .

The central path approaches $(2, 2)$ as $t \to \infty$ , which is the point on the optimal face closest to the analytic center.

The barrier method algorithm

Input: strictly feasible x, initial t > 0, growth factor κ > 1, tolerance ε
Repeat:
    1. Centering step: starting from x, run Newton's method to minimize
       t·f(x) + φ(x), obtaining x*(t)
    2. Update: x ← x*(t)
    3. Stopping test: if m/t < ε, return x
    4. Increase: t ← κ·t

The outer loop (steps 1 to 4) runs $O(\sqrt{m}/\ln \kappa)$ times. Each centering step takes a bounded number of Newton iterations (typically 5 to 15 in practice).

Choosing $\kappa$

Small $\kappa$ (like 1.2): many outer iterations, but each centering step needs few Newton steps.
Large $\kappa$ (like 100): few outer iterations, but each centering step needs many Newton steps.
Theory suggests $\kappa = 1 + 1/\sqrt{m}$ balances these, but in practice $\kappa$ between 2 and 20 works well.

Example 2: Tracing the barrier method on a small LP

Problem (in standard inequality form):

\min \; -2x_1 - x_2 \quad \text{s.t.} \quad x_1 + x_2 \le 4, \quad x_1 \le 3, \quad x_2 \le 3, \quad x_1, x_2 \ge 0

There are $m = 5$ inequality constraints. The optimum is at $(3, 1)$ with objective $-7$ .

Iteration 1: $t = 1$

The barrier problem balances the objective against all five log barrier terms. Newton’s method finds the central path point. Due to symmetry-breaking from the objective ( $-2x_1$ pulls harder than $-x_2$ ), the solution favors larger $x_1$ .

Numerical solution: $x \approx (1.85, 1.45)$ . Objective: $\approx -5.15$ . Duality gap: $m/t = 5$ .

Iteration 2: $t = 10$

The objective dominates more. Newton’s method (warm-started from the previous solution) converges in about 6 steps.

Numerical solution: $x \approx (2.78, 1.53)$ . Objective: $\approx -7.09$ . Duality gap: $0.5$ .

Iteration 3: $t = 100$

Now the barrier has very little influence. The solution is pushed close to the vertex.

Numerical solution: $x \approx (2.98, 1.05)$ . Objective: $\approx -7.01$ . Duality gap: $0.05$ .

Iteration 4: $t = 1000$

Numerical solution: $x \approx (2.998, 1.005)$ . Objective: $\approx -7.001$ . Duality gap: $0.005$ .

Total Newton steps across all iterations: about 25. Compare this with the simplex method, which would find the exact vertex solution in 2 to 5 pivots for this small problem. Interior point methods shine on much larger problems.

Primal-dual interior point methods

The basic barrier method works, but primal-dual methods are what people actually use in production solvers (CPLEX, Gurobi, MOSEK). The idea: instead of solving the barrier subproblem exactly, solve the KKT system for the barrier problem approximately, updating both primal variables $x$ and dual variables $\lambda$ simultaneously.

The modified KKT system

For the barrier problem, the KKT conditions are:

\nabla f(x) + \sum_{i=1}^{m} \lambda_i \nabla g_i(x) = 0

\lambda_i (-g_i(x)) = \frac{1}{t}, \quad i = 1, \dots, m

g_i(x) < 0, \quad \lambda_i > 0

In matrix form, define $s_i = -g_i(x)$ (the slacks). Then the second condition becomes $\lambda_i s_i = 1/t$ for all $i$ , or equivalently $\Lambda S e = (1/t) e$ , where $\Lambda = \text{diag}(\lambda)$ , $S = \text{diag}(s)$ , and $e$ is the all-ones vector.

A Newton step on this system gives the primal-dual search direction. The key advantage over the basic barrier method: you solve one linear system per iteration instead of running Newton to convergence on the barrier subproblem.

For linear programs

When $f(x) = c^T x$ and constraints are $Ax \le b$ , the system simplifies considerably. The Newton step reduces to solving:

\begin{bmatrix} 0 & A^T & I \\ A & 0 & 0 \\ 0 & S & \Lambda \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta \lambda \\ \Delta s \end{bmatrix} = \begin{bmatrix} r_{\text{dual}} \\ r_{\text{primal}} \\ r_{\text{cent}} \end{bmatrix}

where $r_{\text{dual}}$ , $r_{\text{primal}}$ , and $r_{\text{cent}}$ are the residuals for dual feasibility, primal feasibility, and centering respectively.

Example 3: One primal-dual step on a tiny LP

Problem:

\min \; -3x_1 - 2x_2 \quad \text{s.t.} \quad x_1 + x_2 \le 4, \quad x_1 \le 3, \quad x_1, x_2 \ge 0

Written in standard form with slacks $s_1 = 4 - x_1 - x_2$ , $s_2 = 3 - x_1$ , $s_3 = x_1$ , $s_4 = x_2$ .

Current point (feasible, not optimal): $x = (1, 1)$ , so $s = (2, 2, 1, 1)$ .

Set $\lambda = (0.5, 0.5, 0.5, 0.5)$ (rough initial guess). The centering parameter $\sigma = 0.5$ , and $\mu = \frac{s^T \lambda}{m} = \frac{0.5 \cdot 2 + 0.5 \cdot 2 + 0.5 \cdot 1 + 0.5 \cdot 1}{4} = \frac{3}{4} = 0.75$ .

Target: $\lambda_i s_i = \sigma \mu = 0.375$ .

Centering residual for each constraint:

r_{\text{cent}, i} = \sigma \mu - \lambda_i s_i

r_1 = 0.375 - 0.5 \times 2 = 0.375 - 1.0 = -0.625

r_2 = 0.375 - 0.5 \times 2 = -0.625

r_3 = 0.375 - 0.5 \times 1 = -0.125

r_4 = 0.375 - 0.5 \times 1 = -0.125

Dual residual (gradient of Lagrangian):

r_{\text{dual}} = c + A^T \lambda = \begin{pmatrix} -3 \\ -2 \end{pmatrix} + \begin{pmatrix} -1 & -1 & 1 & 0 \\ -1 & 0 & 0 & 1 \end{pmatrix}^T \begin{pmatrix} 0.5 \\ 0.5 \\ 0.5 \\ 0.5 \end{pmatrix}

= \begin{pmatrix} -3 \\ -2 \end{pmatrix} + \begin{pmatrix} -0.5 - 0.5 + 0.5 + 0 \\ -0.5 + 0 + 0 + 0.5 \end{pmatrix} = \begin{pmatrix} -3 - 0.5 \\ -2 + 0 \end{pmatrix} = \begin{pmatrix} -3.5 \\ -2.0 \end{pmatrix}

Wait, let me redo this carefully. The sign convention depends on whether constraints are $Ax \le b$ with $s = b - Ax$ .

The dual residual should be $c - A^T \lambda$ where $A$ is the constraint matrix for $Ax + s = b$ :

A = \begin{pmatrix} 1 & 1 \\ 1 & 0 \\ -1 & 0 \\ 0 & -1 \end{pmatrix}, \quad b = \begin{pmatrix} 4 \\ 3 \\ 0 \\ 0 \end{pmatrix}

The last two rows handle $x_1 \ge 0$ and $x_2 \ge 0$ as $-x_1 \le 0$ and $-x_2 \le 0$ .

With this setup, the dual residual is nonzero, indicating the current $\lambda$ is not dual feasible. The Newton system produces a direction $(\Delta x, \Delta \lambda, \Delta s)$ that simultaneously improves primal optimality, dual feasibility, and centering.

After solving the linear system and taking a step with appropriate step size (ensuring $s, \lambda$ remain positive), we get a new iterate closer to the optimum $(3, 1)$ .

The main takeaway: each primal-dual iteration solves one linear system and makes progress on all fronts simultaneously.

Self-concordant barriers

The complexity analysis of interior point methods relies on a special property of the barrier function called self-concordance. A function $f$ is self-concordant if:

|f'''(x)| \le 2 \bigl(f''(x)\bigr)^{3/2}

in one dimension, with a generalization to higher dimensions. The log barrier $-\ln(x)$ is self-concordant, and sums of self-concordant functions are self-concordant.

Why does this matter? Self-concordance guarantees that Newton’s method converges quadratically from any point in a well-defined neighborhood of the minimizer, without needing a line search. This is what gives interior point methods their theoretical complexity bounds.

The complexity parameter $\nu$ of a self-concordant barrier equals the number of constraints $m$ for the standard log barrier. Better barriers with smaller $\nu$ exist for specific constraint structures (second-order cones, semidefinite constraints).

Complexity comparison

Method	Iterations for LP with $m$ constraints, $n$ variables
Simplex	Worst case exponential, average polynomial, very fast in practice
Basic barrier	$O(\sqrt{m} \ln(m/\epsilon))$ Newton steps
Primal-dual IP	$O(\sqrt{m} \ln(1/\epsilon))$ iterations

Each interior point iteration costs $O(n^2 m)$ to $O(n^3)$ depending on structure. For large sparse problems (thousands of variables), interior point methods often beat simplex. For small dense problems, simplex can be faster.

Practical considerations

Preprocessing. Modern IP solvers spend significant effort on preprocessing: removing redundant constraints, fixing variables, scaling the problem. This can reduce a million-variable LP to a much smaller equivalent.

Mehrotra’s predictor-corrector. The most important practical improvement. Instead of one Newton step per iteration, take a “predictor” step (pure Newton) and a “corrector” step (centering). This roughly halves the number of outer iterations in practice.

Crossover. Interior point methods find solutions in the interior, not at vertices. If you need an exact vertex solution (for integer programming, for instance), you run a “crossover” procedure that identifies the active constraints and moves to a nearby vertex.

Python sketch: barrier method

import numpy as np

def barrier_method(c, A, b, x0, t0=1.0, kappa=10.0, tol=1e-8, max_outer=50):
    """Solve min c^T x s.t. Ax <= b using the log barrier method."""
    x = x0.copy()
    m = len(b)
    t = t0
    
    for outer in range(max_outer):
        # Newton's method for the barrier subproblem
        for _ in range(50):
            s = b - A @ x  # slacks, must be positive
            grad = t * c - A.T @ (1.0 / s)
            H = A.T @ np.diag(1.0 / s**2) @ A
            dx = np.linalg.solve(H, -grad)
            
            # Backtracking line search to stay feasible
            step = 1.0
            while np.any(b - A @ (x + step * dx) <= 0):
                step *= 0.5
            x = x + 0.9 * step * dx
            
            if np.linalg.norm(dx) < 1e-10:
                break
        
        gap = m / t
        if gap < tol:
            break
        t *= kappa
    
    return x, t * c @ x

What comes next

Interior point methods solve LPs in polynomial time, but the simplex method is the classic algorithm that started it all. Despite its exponential worst-case complexity, simplex is often the fastest method in practice for small to medium problems. We will walk through it step by step, including the tableau, pivoting, and degeneracy handling.

← Back to all series