Search…

The simplex method

In this series (18 parts)
  1. What is optimization and why ML needs it
  2. Convex sets and convex functions
  3. Optimality conditions: first order
  4. Optimality conditions: second order
  5. Line search methods
  6. Least squares: the closed-form solution
  7. Steepest descent (gradient descent)
  8. Newton's method for optimization
  9. Quasi-Newton methods: BFGS and L-BFGS
  10. Conjugate gradient methods
  11. Constrained optimization and Lagrangian duality
  12. KKT conditions
  13. Penalty and barrier methods
  14. Interior point methods
  15. The simplex method
  16. Frank-Wolfe method
  17. Optimization in dynamic programming and optimal control
  18. Stochastic gradient descent and variants

A factory scheduling problem

A factory makes two products. Each product uses machine time and labor, but in different amounts. The factory has limited resources. How do you maximize profit?

Machine hours per unitLabor hours per unitProfit per unit
Product A32$5
Product B14$4
Available12 hours16 hours

You could try every combination of products A and B, but the constraints carve out a region of feasible options. That region is a polygon. The optimal answer always sits at a corner.

The feasible region

graph TD
  A["Constraint 1: 3A + 1B <= 12 (machine hours)"]
  B["Constraint 2: 2A + 4B <= 16 (labor hours)"]
  C["Constraint 3: A >= 0, B >= 0"]
  A --> D["Feasible region: a polygon (intersection of half-planes)"]
  B --> D
  C --> D
  D --> E["Optimal solution is at a corner (vertex) of this polygon"]

The simplex method walks along the edges of this polygon, checking one corner at a time, always moving to a better corner until no better neighbor exists. It never needs to search the interior.

Now let’s see the math behind this.

Prerequisites

You should understand Lagrangian duality and the basics of linear algebra, especially matrix operations. Knowing what a convex set is will help you see why the feasible region of an LP has a nice structure.


Linear programs

A linear program (LP) minimizes a linear objective subject to linear constraints. Every LP can be written in standard form:

minx  cTxsubject toAx=b,x0\min_{x} \; c^T x \quad \text{subject to} \quad Ax = b, \quad x \ge 0

where AA is an m×nm \times n matrix with m<nm < n (more variables than constraints), b0b \ge 0, and cRnc \in \mathbb{R}^n.

Any LP with inequality constraints can be converted to this form by adding slack variables. For example:

x1+2x210x1+2x2+s1=10,s10x_1 + 2x_2 \le 10 \quad \Longrightarrow \quad x_1 + 2x_2 + s_1 = 10, \quad s_1 \ge 0

Converting to standard form

graph TD
  A["Original LP with inequalities"] --> B["Add slack variable for each <= constraint"]
  B --> C["All constraints become equalities"]
  C --> D["Standard form: min c^T x, Ax = b, x >= 0"]

Why standard form matters

The constraints Ax=bAx = b define a flat surface (affine subspace) and x0x \ge 0 intersects it with the non-negative orthant. The result is a polytope: a bounded region with flat faces, edges, and vertices. The simplex method exploits the fact that the optimal solution to an LP always occurs at a vertex of this polytope (assuming the optimum exists and is finite).


Basic feasible solutions

A basic feasible solution (BFS) is a vertex of the feasible polytope. Here is the formal definition.

Given Ax=bAx = b with AA being m×nm \times n, choose mm columns of AA that form a nonsingular m×mm \times m submatrix BB. Call these the basic columns, and the remaining nmn - m columns are nonbasic. Set the nonbasic variables to zero and solve BxB=bBx_B = b for the basic variables xBx_B. If xB0x_B \ge 0, then this is a basic feasible solution.

The simplex method moves from one BFS to an adjacent one (they share m1m - 1 basic variables) by swapping one variable in and one variable out of the basis.


The simplex tableau

The simplex method is most easily tracked with a tableau. Given a current basis BB, the tableau organizes:

  • The basic variables and their values
  • The reduced costs cˉj=cjcBTB1Aj\bar{c}_j = c_j - c_B^T B^{-1} A_j for each nonbasic variable
  • The constraint coefficients after elimination

The algorithm:

  1. Optimality check: If all reduced costs cˉj0\bar{c}_j \ge 0 for nonbasic variables, the current BFS is optimal. Stop.
  2. Pivot selection (entering variable): Choose a nonbasic variable jj with cˉj<0\bar{c}_j < 0. This variable will enter the basis.
  3. Ratio test (leaving variable): Among basic variables, find which one hits zero first as xjx_j increases. This is the minimum ratio test.
  4. Pivot: Swap the entering and leaving variables. Update the tableau.
  5. Go to step 1.

Simplex iteration at a glance

graph TD
  A["Current vertex (BFS)"] --> B["Check: all reduced costs >= 0?"]
  B -- Yes --> C["Optimal! Stop."]
  B -- No --> D["Pick entering variable (negative reduced cost)"]
  D --> E["Ratio test: find leaving variable"]
  E --> F["Pivot: move to adjacent vertex"]
  F --> A

Example 1: Solving a 2-variable LP by simplex

Problem:

max  5x1+4x2\max \; 5x_1 + 4x_2

subject to:

6x1+4x2246x_1 + 4x_2 \le 24 x1+2x26x_1 + 2x_2 \le 6 x1,x20x_1, x_2 \ge 0

(We maximize here. To fit the “minimize” convention, negate the objective: min  5x14x2\min \; -5x_1 - 4x_2.)

Step 1: Add slack variables.

6x1+4x2+s1=246x_1 + 4x_2 + s_1 = 24 x1+2x2+s2=6x_1 + 2x_2 + s_2 = 6

Variables: x1,x2,s1,s2x_1, x_2, s_1, s_2. All 0\ge 0.

Step 2: Initial BFS.

Set x1=0,x2=0x_1 = 0, x_2 = 0. Then s1=24,s2=6s_1 = 24, s_2 = 6. Basis = {s1,s2}\{s_1, s_2\}.

Objective value: 5(0)+4(0)=05(0) + 4(0) = 0.

Step 3: Compute reduced costs.

For maximization, we want to increase a variable with a positive coefficient in the objective (equivalently, negative reduced cost in the min formulation). x1x_1 has coefficient 55 and x2x_2 has coefficient 44. Choose x1x_1 (largest coefficient, Dantzig’s rule).

Step 4: Ratio test.

Increase x1x_1. From constraint 1: 6x124x146x_1 \le 24 \Rightarrow x_1 \le 4. From constraint 2: x16x_1 \le 6. Minimum ratio: min(24/6,6/1)=min(4,6)=4\min(24/6, 6/1) = \min(4, 6) = 4.

So s1s_1 leaves the basis, x1x_1 enters. New basis = {x1,s2}\{x_1, s_2\}.

Pivot: Divide row 1 by 6:

x1+23x2+16s1=4x_1 + \tfrac{2}{3}x_2 + \tfrac{1}{6}s_1 = 4

Update row 2: subtract row 1 from row 2:

(11)x1+(223)x2+(016)s1+s2=64(1 - 1)x_1 + (2 - \tfrac{2}{3})x_2 + (0 - \tfrac{1}{6})s_1 + s_2 = 6 - 4 43x216s1+s2=2\tfrac{4}{3}x_2 - \tfrac{1}{6}s_1 + s_2 = 2

New BFS: x1=4,x2=0,s1=0,s2=2x_1 = 4, x_2 = 0, s_1 = 0, s_2 = 2.

Objective: 5(4)+4(0)=205(4) + 4(0) = 20.

Step 5: Check optimality.

Reduced cost for x2x_2: 45(2/3)=410/3=2/3>04 - 5 \cdot (2/3) = 4 - 10/3 = 2/3 > 0 (still improvable in max sense).

Reduced cost for s1s_1: 05(1/6)=5/60 - 5 \cdot (1/6) = -5/6 (not improvable).

Actually let me redo this more carefully with the tableau approach for the max problem.

Update the objective row. Original objective: z=5x1+4x2z = 5x_1 + 4x_2. After substituting x1=4(2/3)x2(1/6)s1x_1 = 4 - (2/3)x_2 - (1/6)s_1:

z=5(423x216s1)+4x2=20103x256s1+4x2z = 5(4 - \tfrac{2}{3}x_2 - \tfrac{1}{6}s_1) + 4x_2 = 20 - \tfrac{10}{3}x_2 - \tfrac{5}{6}s_1 + 4x_2 z=20+23x256s1z = 20 + \tfrac{2}{3}x_2 - \tfrac{5}{6}s_1

x2x_2 has a positive coefficient (2/32/3), so we can improve. Enter x2x_2.

Step 6: Ratio test for x2x_2.

Row 1: x1+(2/3)x2=4x_1 + (2/3)x_2 = 4, so x24/(2/3)=6x_2 \le 4/(2/3) = 6.

Row 2: (4/3)x2=2(4/3)x_2 = 2, so x22/(4/3)=3/2x_2 \le 2/(4/3) = 3/2.

Minimum ratio: 3/23/2. So s2s_2 leaves, x2x_2 enters. New basis = {x1,x2}\{x_1, x_2\}.

Pivot: Multiply row 2 by 3/43/4:

x218s1+34s2=32x_2 - \tfrac{1}{8}s_1 + \tfrac{3}{4}s_2 = \tfrac{3}{2}

Update row 1: subtract (2/3)×(2/3) \times new row 2:

x1+16s1(2/3)(18s1+34s2)=4(2/3)(32)x_1 + \tfrac{1}{6}s_1 - (2/3)(-\tfrac{1}{8}s_1 + \tfrac{3}{4}s_2) = 4 - (2/3)(\tfrac{3}{2}) x1+16s1+112s112s2=41=3x_1 + \tfrac{1}{6}s_1 + \tfrac{1}{12}s_1 - \tfrac{1}{2}s_2 = 4 - 1 = 3 x1+14s112s2=3x_1 + \tfrac{1}{4}s_1 - \tfrac{1}{2}s_2 = 3

New BFS: x1=3,x2=3/2,s1=0,s2=0x_1 = 3, x_2 = 3/2, s_1 = 0, s_2 = 0.

Objective: z=20+(2/3)(3/2)(5/6)(0)=20+1=21z = 20 + (2/3)(3/2) - (5/6)(0) = 20 + 1 = 21.

Check optimality. Update the objective:

z=20+23x256s1z = 20 + \tfrac{2}{3}x_2 - \tfrac{5}{6}s_1

Substitute x2=3/2+(1/8)s1(3/4)s2x_2 = 3/2 + (1/8)s_1 - (3/4)s_2:

z=20+23(32+18s134s2)56s1z = 20 + \tfrac{2}{3}(\tfrac{3}{2} + \tfrac{1}{8}s_1 - \tfrac{3}{4}s_2) - \tfrac{5}{6}s_1 z=20+1+112s112s256s1z = 20 + 1 + \tfrac{1}{12}s_1 - \tfrac{1}{2}s_2 - \tfrac{5}{6}s_1 z=2134s112s2z = 21 - \tfrac{3}{4}s_1 - \tfrac{1}{2}s_2

Both nonbasic variables (s1,s2s_1, s_2) have negative coefficients. No improvement possible. Optimal!

Solution: x1=3,x2=3/2x_1 = 3, x_2 = 3/2. Optimal value: z=21z = 21.

Verification: 6(3)+4(3/2)=18+6=246(3) + 4(3/2) = 18 + 6 = 24 ✓. 3+2(3/2)=3+3=63 + 2(3/2) = 3 + 3 = 6 ✓. Both constraints tight.


Vertices of the feasible region

The feasible region of the LP above is a polygon in 2D. Its vertices are the points where pairs of constraint boundaries intersect while remaining feasible:

Vertexx1x_1x2x_2Objective 5x1+4x25x_1 + 4x_2
Origin000
x1x_1-axis4020
Intersection31.521
x2x_2-axis0312

The simplex method visited: (0,0)(4,0)(3,1.5)(0,0) \to (4,0) \to (3, 1.5). It walked along two edges of the polygon to reach the optimum.

The simplex method moves along vertices of the feasible polytope. Starting from (0,0), it moves to (4,0) then to the optimal vertex (3,2) where the objective 3x + 2y = 13 is maximized.


Example 2: Starting with a full tableau

Problem:

min  x12x2\min \; -x_1 - 2x_2

subject to:

x1+x25x_1 + x_2 \le 5 x14x_1 \le 4 x23x_2 \le 3 x1,x20x_1, x_2 \ge 0

Add slacks s1,s2,s3s_1, s_2, s_3:

Initial tableau:

Basisx1x_1x2x_2s1s_1s2s_2s3s_3RHS
s1s_1111005
s2s_2100104
s3s_3010013
zz-1-20000

BFS: (x1,x2)=(0,0)(x_1, x_2) = (0, 0), z=0z = 0.

Iteration 1: Most negative reduced cost: x2x_2 with 2-2. Enter x2x_2.

Ratios: 5/1=55/1 = 5, no entry for s2s_2 (coefficient is 0), 3/1=33/1 = 3. Minimum: 3. s3s_3 leaves.

Pivot on row 3, column x2x_2. Row 3 stays (already coefficient 1):

Basisx1x_1x2x_2s1s_1s2s_2s3s_3RHS
s1s_11010-12
s2s_2100104
x2x_2010013
zz-100026

BFS: (x1,x2)=(0,3)(x_1, x_2) = (0, 3), z=(6)=6z = -(-6) = 6 improvement.

Wait, in the min formulation: z=x12x2z = -x_1 - 2x_2. At (0,3)(0, 3): z=06=6z = 0 - 6 = -6. The tableau shows z=6z = 6 because we track z-z or we track the objective differently. Let me keep the convention consistent: the bottom row tracks z(x12x2)=0z - (-x_1 - 2x_2) = 0, so z=x12x2+z = -x_1 - 2x_2 + (reduced cost terms).

After the pivot: z=6x1+2s3z = -6 - x_1 + 2s_3. Current value: 6-6.

Iteration 2: x1x_1 has reduced cost 1-1. Enter x1x_1.

Ratios: 2/1=22/1 = 2, 4/1=44/1 = 4. Minimum: 2. s1s_1 leaves.

Pivot on row 1, column x1x_1:

Basisx1x_1x2x_2s1s_1s2s_2s3s_3RHS
x1x_11010-12
s2s_200-1112
x2x_2010013
zz00101-8

Correcting the zz row: after entering x1x_1, the reduced costs become z=8+s1+s3z = -8 + s_1 + s_3. All reduced costs non-negative. Optimal!

Solution: x1=2,x2=3x_1 = 2, x_2 = 3. Objective: (2)2(3)=8-(2) - 2(3) = -8.

Check: 2+3=552 + 3 = 5 \le 5 ✓, 242 \le 4 ✓, 333 \le 3 ✓.


Example 3: Identifying degeneracy

Problem:

min  x1x2\min \; -x_1 - x_2

subject to:

x1+x24x_1 + x_2 \le 4 x12x_1 \le 2 x22x_2 \le 2 x1,x20x_1, x_2 \ge 0

The optimum is (2,2)(2, 2) with objective 4-4. But notice something: at (2,2)(2, 2), all three constraints are active (x1+x2=4x_1 + x_2 = 4, x1=2x_1 = 2, x2=2x_2 = 2). We have three active constraints but only two variables. This means the vertex (2,2)(2, 2) is degenerate: more constraints are active than necessary.

Why degeneracy matters:

In a degenerate BFS, some basic variables equal zero. During a pivot, the ratio test might give a ratio of zero, meaning the objective does not improve. The simplex method can cycle: it pivots through a sequence of degenerate bases that all represent the same point, never making progress.

How to handle it:

  • Bland’s rule: Always choose the entering and leaving variables with the smallest index. This guarantees termination.
  • Lexicographic rule: Break ties in the ratio test using a lexicographic comparison.
  • Perturbation: Add tiny random perturbations to bb to make the problem non-degenerate.

In practice, cycling is extremely rare. Most solvers use Bland’s rule as a fallback.

Let us trace through. Starting from (0,0)(0, 0) with slacks (4,2,2)(4, 2, 2):

Iteration 1: Enter x1x_1 (reduced cost 1-1). Ratios: 4/1=44/1 = 4, 2/1=22/1 = 2. s2s_2 leaves. New BFS: (2,0)(2, 0).

Iteration 2: Enter x2x_2 (reduced cost 1-1). Ratios: (42)/1=2(4-2)/1 = 2, 2/1=22/1 = 2. Tie! Both s1s_1 and s3s_3 give ratio 2. Pick s1s_1 by Bland’s rule (smaller index).

New BFS: (2,2)(2, 2). All slacks are zero. This is a degenerate vertex (three active constraints, but basis has only two non-slack variables plus one slack at zero).

Iteration 3: Check reduced costs. Both s1s_1 and s2s_2 have non-negative reduced costs. Optimal.

Solution: (2,2)(2, 2), objective 4-4.


Complexity

Worst case: The simplex method can take exponentially many steps. The classic Klee-Minty cube example constructs a polytope in nn dimensions where the simplex method (with Dantzig’s rule) visits all 2n2^n vertices.

Average case: Smoothed analysis by Spielman and Teng showed that the simplex method runs in polynomial expected time when the input is slightly perturbed. In practice, for an LP with mm constraints and nn variables, the simplex method typically takes O(m)O(m) to O(3m)O(3m) iterations.

Each iteration costs O(mn)O(mn) for a dense tableau update, but sparse implementations can be much faster.

This is why the simplex method remains competitive with interior point methods despite its exponential worst case: the bad cases are pathological and never show up in real problems.


Duality in the simplex method

The simplex method implicitly maintains dual variables. At a BFS with basis BB, the dual variables are:

y=(B1)TcBy = (B^{-1})^T c_B

The reduced cost of a nonbasic variable jj is:

cˉj=cjyTAj\bar{c}_j = c_j - y^T A_j

When all reduced costs are non-negative, yy is dual feasible and complementary slackness holds. The primal and dual objectives are equal, confirming optimality. This is a direct application of Lagrangian duality.


Big-M and two-phase method

Sometimes finding an initial BFS is not obvious (when the original constraints are equalities or \ge inequalities). Two approaches:

Big-M method: Add artificial variables aia_i with huge cost MM to the objective. If the optimal solution has all ai=0a_i = 0, the original problem is feasible.

Two-phase method:

  • Phase I: Minimize the sum of artificial variables. If the minimum is zero, you have a BFS for the original problem.
  • Phase II: Use that BFS to start the simplex method on the original problem.

The two-phase method is numerically more stable because it avoids the large coefficient MM.


When to use simplex vs interior point

ScenarioRecommended
Small LP (< 1000 variables)Simplex
Large sparse LPInterior point
Need many solves with small changesSimplex (warm-starting)
Quadratic or conic programInterior point
Need exact vertex solutionSimplex

Simplex excels at warm-starting: if you change one constraint or one cost coefficient, the previous basis is often still nearly optimal, and a few pivots suffice. Interior point methods essentially start from scratch each time.


Python implementation sketch

import numpy as np

def simplex_tableau(c, A, b):
    """Solve min c^T x s.t. Ax <= b, x >= 0 using the simplex method."""
    m, n = A.shape
    # Add slack variables
    tableau = np.zeros((m + 1, n + m + 1))
    tableau[:m, :n] = A
    tableau[:m, n:n+m] = np.eye(m)
    tableau[:m, -1] = b
    tableau[-1, :n] = c  # objective row (for min)
    
    basis = list(range(n, n + m))  # slack variables
    
    while True:
        # Find entering variable (most negative reduced cost)
        reduced_costs = tableau[-1, :-1]
        j = np.argmin(reduced_costs)
        if reduced_costs[j] >= -1e-10:
            break  # optimal
        
        # Ratio test
        col = tableau[:m, j]
        rhs = tableau[:m, -1]
        ratios = np.full(m, np.inf)
        for i in range(m):
            if col[i] > 1e-10:
                ratios[i] = rhs[i] / col[i]
        i = np.argmin(ratios)
        if ratios[i] == np.inf:
            raise ValueError("Unbounded")
        
        # Pivot
        pivot = tableau[i, j]
        tableau[i] /= pivot
        for k in range(m + 1):
            if k != i:
                tableau[k] -= tableau[k, j] * tableau[i]
        basis[i] = j
    
    x = np.zeros(n)
    for i, var in enumerate(basis):
        if var < n:
            x[var] = tableau[i, -1]
    
    return x, tableau[-1, -1]

What comes next

The simplex method walks along edges of the feasible polytope. The Frank-Wolfe method borrows a similar idea for nonlinear objectives: at each step, it solves a linear subproblem over the constraint set and takes a step toward that solution. This yields sparse iterates and works well for large structured problems.

Start typing to search across all content
navigate Enter open Esc close