Mar 16, 2026 · 20 min read · ML Optimization

KKT conditions

In this series (18 parts)

Prerequisites: This article builds directly on Lagrangian duality. You should be comfortable with Lagrange multipliers for equality-constrained problems before continuing.

A factory with resource constraints

A factory makes two products, A and B. Each unit of A earns $5 profit, each unit of B earns $4. But resources are limited:

Resource	A uses per unit	B uses per unit	Available
Material	2 kg	1 kg	10 kg
Labor	1 hr	2 hr	8 hr

The factory wants to maximize profit. The unconstrained answer would be “make infinitely many of both,” but the constraints fence in the possibilities. Only certain combinations of A and B are feasible.

The optimal point sits on the boundary of the feasible region, where the profit contours just touch the constraint walls. KKT conditions tell you exactly when a point is optimal for a constrained problem. They are a checklist: if all four conditions pass, you have found the answer (at least for convex problems).

Feasible region with objective contours

graph TD
  A["Feasible region:<br/>all constraint inequalities satisfied"] --> B["Objective contours<br/>sweep across the region"]
  B --> C["Optimal point: where the best contour<br/>just touches the feasible boundary"]
  C --> D["KKT conditions:<br/>four checks that confirm optimality"]
  style C fill:#ccffcc
  style D fill:#ccffcc

The relationship: unconstrained to equality to inequality constraints

graph LR
  A["Unconstrained:<br/>set gradient = 0"] --> B["Equality constraints:<br/>Lagrange multipliers"]
  B --> C["Inequality constraints:<br/>KKT conditions"]
  A -->|"Simplest case"| D["No constraints to worry about"]
  B -->|"Adds multipliers for equalities"| E["Gradient of Lagrangian = 0"]
  C -->|"Adds sign constraints<br/>and complementary slackness"| F["Four conditions to check"]

KKT conditions are the most general first-order optimality conditions for smooth constrained problems. Everything else is a special case.

From equality to inequality constraints

The Lagrangian method handles equality constraints cleanly. You set up a Lagrangian, take gradients, set them to zero, and solve. But real optimization problems rarely come with only equalities. Budget limits, capacity bounds, non-negativity requirements: these are all inequalities.

The general constrained optimization problem looks like this:

\min_{x} f(x) \quad \text{subject to} \quad g_i(x) \leq 0, \; i = 1, \ldots, m \quad \text{and} \quad h_j(x) = 0, \; j = 1, \ldots, p

The Karush-Kuhn-Tucker (KKT) conditions extend the Lagrangian framework to handle these inequality constraints. They give you a checklist: if a point passes all four conditions, you know it is a candidate for optimality. For convex problems, passing the checklist is enough to guarantee optimality.

The four KKT conditions

Given the problem above, form the Lagrangian:

\mathcal{L}(x, \mu, \lambda) = f(x) + \sum_{i=1}^{m} \mu_i g_i(x) + \sum_{j=1}^{p} \lambda_j h_j(x)

Here $\mu_i$ are the multipliers for inequality constraints and $\lambda_j$ are the multipliers for equality constraints. A point $x^*$ with multipliers $\mu^*$ and $\lambda^*$ satisfies the KKT conditions if all four of the following hold:

1. Stationarity

\nabla f(x^*) + \sum_{i=1}^{m} \mu_i^* \nabla g_i(x^*) + \sum_{j=1}^{p} \lambda_j^* \nabla h_j(x^*) = 0

The gradient of the objective plus the weighted gradients of all constraints must sum to zero. This is the same idea as setting the gradient of the Lagrangian to zero.

2. Primal feasibility

g_i(x^*) \leq 0 \quad \text{for all } i, \qquad h_j(x^*) = 0 \quad \text{for all } j

The solution must actually satisfy all the constraints. No cheating.

3. Dual feasibility

\mu_i^* \geq 0 \quad \text{for all } i

The multipliers on inequality constraints must be non-negative. This is the key difference from equality constraints, where $\lambda_j$ can be any real number. The sign restriction on $\mu_i$ reflects the fact that inequality constraints can only “push” the solution in one direction.

4. Complementary slackness

\mu_i^* g_i(x^*) = 0 \quad \text{for all } i

For each inequality constraint, either the constraint is active ( $g_i(x^*) = 0$ ) or its multiplier is zero ( $\mu_i^* = 0$ ), or both. This is the most interesting condition, and it deserves its own section.

Complementary slackness explained

Think about it this way. If a constraint is inactive, meaning $g_i(x^*) < 0$ , the solution is strictly inside the feasible region with respect to that constraint. The constraint is not “touching” the solution. It might as well not exist. So its multiplier should be zero, because removing that constraint would not change the optimal value.

On the other hand, if a constraint is active, meaning $g_i(x^*) = 0$ , the solution sits right on the boundary of that constraint. The constraint is binding. Its multiplier $\mu_i^*$ can be positive, reflecting the “cost” of that constraint being there.

The product $\mu_i^* g_i(x^*) = 0$ captures both cases in one equation. It is called complementary slackness because “slack” ( $g_i < 0$ ) and a positive multiplier ( $\mu_i > 0$ ) cannot coexist.

Complementary slackness: active vs inactive constraints

graph TD
  A["Constraint i"] --> B{"Is constraint active?<br/>g_i = 0?"}
  B -->|"Yes: active, tight"| C["Constraint is binding<br/>Multiplier mu_i can be positive"]
  B -->|"No: inactive, slack"| D["Constraint has room<br/>Multiplier mu_i must be 0"]
  C --> E["mu_i * g_i = mu_i * 0 = 0 ✓"]
  D --> F["mu_i * g_i = 0 * g_i = 0 ✓"]
  style C fill:#ffcccc
  style D fill:#ccffcc

This gives you a powerful solving strategy: for each inequality constraint, consider two cases. Either the constraint is active, or $\mu_i = 0$ . Try each combination and see which one leads to a consistent solution.

Interpreting the multipliers

The multiplier $\mu_i^*$ tells you the sensitivity of the optimal value to constraint $i$ . If you relax the constraint slightly, changing $g_i(x) \leq 0$ to $g_i(x) \leq \epsilon$ , the optimal objective value changes by approximately $-\mu_i^* \epsilon$ . A large multiplier means the constraint is “expensive”: it is costing you a lot in objective value.

If $\mu_i^* = 0$ , the constraint is not affecting the solution at all. You could remove it and nothing would change.

Worked Example 1: Minimizing distance with a linear constraint

Problem: Minimize $f(x, y) = x^2 + y^2$ subject to $x + y \geq 1$ .

Contour plot of f(x,y) = x^2 + y^2 with the inequality constraint x + y >= 2. The shaded region is feasible. The KKT point (1,1) lies on the constraint boundary where the gradient of f is proportional to the gradient of the constraint.

Step 1: Rewrite in standard form. The KKT framework uses $g_i(x) \leq 0$ , so rewrite the constraint:

g(x, y) = -x - y + 1 \leq 0

Step 2: Write the KKT conditions.

Stationarity:

\nabla f + \mu \nabla g = 0

\begin{bmatrix} 2x \\ 2y \end{bmatrix} + \mu \begin{bmatrix} -1 \\ -1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

So: $2x - \mu = 0$ and $2y - \mu = 0$ .

Primal feasibility: $-x - y + 1 \leq 0$ , i.e., $x + y \geq 1$ .

Dual feasibility: $\mu \geq 0$ .

Complementary slackness: $\mu(-x - y + 1) = 0$ .

Step 3: Consider two cases.

Case A: Constraint inactive ( $\mu = 0$ ).

From stationarity: $2x = 0$ and $2y = 0$ , giving $x = 0$ , $y = 0$ .

Check primal feasibility: $0 + 0 = 0 \geq 1$ ? No. ✗

The unconstrained minimum $(0, 0)$ violates the constraint. This case fails.

Case B: Constraint active ( $g = 0$ , so $x + y = 1$ ).

From stationarity: $x = \mu/2$ and $y = \mu/2$ , so $x = y$ .

Substituting into $x + y = 1$ : $2x = 1$ , so $x = 1/2$ , $y = 1/2$ .

From stationarity: $\mu = 2x = 1$ .

Step 4: Verify all four conditions at $(x^*, y^*) = (1/2, 1/2)$ , $\mu^* = 1$ .

✓ Stationarity: $2(1/2) - 1 = 0$ and $2(1/2) - 1 = 0$ .
✓ Primal feasibility: $-(1/2) - (1/2) + 1 = 0 \leq 0$ .
✓ Dual feasibility: $\mu^* = 1 \geq 0$ .
✓ Complementary slackness: $1 \cdot 0 = 0$ .

The optimal value is $f(1/2, 1/2) = 1/4 + 1/4 = 1/2$ .

Worked Example 2: A 1D bound constraint

Problem: Minimize $f(x) = (x - 3)^2$ subject to $x \leq 2$ .

This is a simple one-dimensional problem. The objective wants $x = 3$ , but the constraint says $x$ cannot exceed 2.

Step 1: Standard form. The constraint is already in standard form: $g(x) = x - 2 \leq 0$ .

Step 2: Write the KKT conditions.

Stationarity:

f'(x) + \mu g'(x) = 0 \implies 2(x - 3) + \mu = 0

Primal feasibility: $x - 2 \leq 0$ .

Dual feasibility: $\mu \geq 0$ .

Complementary slackness: $\mu(x - 2) = 0$ .

Step 3: Consider two cases.

Case A: Constraint inactive ( $\mu = 0$ ).

From stationarity: $2(x - 3) = 0$ , so $x = 3$ .

Check primal feasibility: $3 - 2 = 1 \leq 0$ ? No. ✗

The unconstrained optimum is infeasible.

Case B: Constraint active ( $x = 2$ ).

From stationarity: $2(2 - 3) + \mu = 0$ , so $-2 + \mu = 0$ , giving $\mu = 2$ .

Step 4: Verify all four conditions at $x^* = 2$ , $\mu^* = 2$ .

✓ Stationarity: $2(2 - 3) + 2 = -2 + 2 = 0$ .
✓ Primal feasibility: $2 - 2 = 0 \leq 0$ .
✓ Dual feasibility: $\mu^* = 2 \geq 0$ .
✓ Complementary slackness: $2 \cdot (2 - 2) = 2 \cdot 0 = 0$ .

The optimal value is $f(2) = (2 - 3)^2 = 1$ .

The multiplier $\mu^* = 2$ tells you something concrete. If you relax the constraint to $x \leq 2.1$ , the optimal value would decrease by approximately $2 \times 0.1 = 0.2$ , from $1.0$ to about $0.81$ (since $(2.1 - 3)^2 = 0.81$ ).

Worked Example 3: A constrained quadratic program

Part A: Unconstrained minimum is feasible

Problem: Minimize $f(x, y) = x^2 + y^2$ subject to $x + y \leq 4$ , $x \geq 0$ , $y \geq 0$ .

Step 1: Standard form. Rewrite all constraints as $g_i \leq 0$ :

g_1(x, y) = x + y - 4 \leq 0, \quad g_2(x, y) = -x \leq 0, \quad g_3(x, y) = -y \leq 0

Step 2: Check the unconstrained minimum. $\nabla f = (2x, 2y) = 0$ gives $(x, y) = (0, 0)$ .

Is $(0, 0)$ feasible?

$g_1 = 0 + 0 - 4 = -4 \leq 0$ ✓
$g_2 = 0 \leq 0$ ✓
$g_3 = 0 \leq 0$ ✓

Yes. The unconstrained minimum lies inside the feasible region. So no constraint is active, and all multipliers are zero: $\mu_1 = \mu_2 = \mu_3 = 0$ .

Step 3: Verify KKT conditions at $(0, 0)$ with $\mu_1 = \mu_2 = \mu_3 = 0$ .

✓ Stationarity: $\nabla f + \mu_1 \nabla g_1 + \mu_2 \nabla g_2 + \mu_3 \nabla g_3 = (0, 0) + 0 + 0 + 0 = (0, 0)$ .
✓ Primal feasibility: all constraints satisfied as shown above.
✓ Dual feasibility: all $\mu_i = 0 \geq 0$ .
✓ Complementary slackness: $0 \cdot (-4) = 0$ , $0 \cdot 0 = 0$ , $0 \cdot 0 = 0$ .

Optimal value: $f(0, 0) = 0$ .

Part B: Unconstrained minimum is infeasible

Problem: Minimize $f(x, y) = (x - 3)^2 + (y - 3)^2$ subject to $x + y \leq 4$ , $x \geq 0$ , $y \geq 0$ .

The unconstrained minimum is $(3, 3)$ , but $3 + 3 = 6 > 4$ . The point violates the first constraint.

Step 1: Which constraints are active? By symmetry (the objective and the binding constraint $x + y = 4$ treat $x$ and $y$ identically), expect $x = y$ at the solution. If $x = y$ and $x + y = 4$ , then $x = y = 2$ . Both $x \geq 0$ and $y \geq 0$ are satisfied but not active.

So assume $g_1$ is active ( $x + y = 4$ ) and $\mu_2 = \mu_3 = 0$ .

Step 2: KKT stationarity with only $\mu_1$ active.

\nabla f + \mu_1 \nabla g_1 = 0

\begin{bmatrix} 2(x - 3) \\ 2(y - 3) \end{bmatrix} + \mu_1 \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}

From the first equation: $2(x - 3) + \mu_1 = 0$ .

From the second equation: $2(y - 3) + \mu_1 = 0$ .

Subtracting: $2(x - 3) = 2(y - 3)$ , so $x = y$ .

With $x + y = 4$ : $x = y = 2$ .

Substituting back: $2(2 - 3) + \mu_1 = 0$ , so $-2 + \mu_1 = 0$ , giving $\mu_1 = 2$ .

Step 3: Verify all KKT conditions at $(2, 2)$ , $\mu_1 = 2$ , $\mu_2 = \mu_3 = 0$ .

✓ Stationarity: $2(2-3) + 2 = 0$ for both components.
✓ Primal feasibility: $2 + 2 - 4 = 0 \leq 0$ ; $-2 \leq 0$ ; $-2 \leq 0$ .
✓ Dual feasibility: $\mu_1 = 2 \geq 0$ , $\mu_2 = 0 \geq 0$ , $\mu_3 = 0 \geq 0$ .
✓ Complementary slackness: $2 \cdot 0 = 0$ ; $0 \cdot (-2) = 0$ ; $0 \cdot (-2) = 0$ .

Optimal value: $f(2, 2) = (2-3)^2 + (2-3)^2 = 1 + 1 = 2$ .

Again, $\mu_1 = 2$ tells you that relaxing the constraint $x + y \leq 4$ to $x + y \leq 4.1$ would decrease the optimal value by about $0.2$ .

KKT for convex vs non-convex problems

For convex problems (convex objective, convex inequality constraints, affine equality constraints), the KKT conditions are both necessary and sufficient for global optimality. If you find a point satisfying all four conditions, you have the global minimum. No further checking needed.

All three examples above were convex, so verifying KKT was enough to confirm optimality.

For non-convex problems, KKT conditions are only necessary (under a technical condition called constraint qualification). A point can satisfy all four KKT conditions and still be a local minimum, a saddle point, or even a local maximum. You would need second-order conditions involving the Hessian to distinguish these cases.

Constraint qualification is a regularity condition that ensures the KKT conditions actually apply. The most common one is the Linear Independence Constraint Qualification (LICQ): the gradients of the active constraints at $x^*$ must be linearly independent. In practice, LICQ holds for most well-posed problems. When it fails, you can get situations where the KKT conditions have no solution even though an optimal point exists.

Connection to machine learning

KKT conditions are not just theoretical. They appear directly in several ML algorithms.

Support vector machines. The SVM optimization problem minimizes $\|w\|^2$ subject to the constraint that each training point is correctly classified with a margin: $y_i(w \cdot x_i + b) \geq 1$ . The KKT conditions for this problem tell you that:

Most data points have $\mu_i = 0$ (they are not support vectors).
The support vectors are the points where the constraint is active ( $\mu_i > 0$ ).

Complementary slackness is what makes SVMs sparse: only a few training points actually determine the decision boundary.

Regularized regression. L1 regularization (lasso) can be formulated as a constrained optimization problem: minimize the squared error subject to $\|w\|_1 \leq t$ . The KKT conditions explain why lasso produces sparse solutions. When a coefficient’s constraint is not active, the corresponding multiplier is zero, and the coefficient can be nonzero. When the constraint binds, complementary slackness forces certain coefficients to exactly zero.

Constrained neural networks. Modern applications sometimes add constraints to neural network training, such as fairness constraints or robustness constraints. These are solved using KKT-based methods, often through augmented Lagrangian or penalty approaches.

Python implementation

You can solve constrained optimization problems in Python using scipy.optimize.minimize with the SLSQP method, which handles inequality constraints directly.

import numpy as np
from scipy.optimize import minimize

# Example 3B: Minimize (x-3)^2 + (y-3)^2
# subject to x + y <= 4, x >= 0, y >= 0

def objective(xy):
    x, y = xy
    return (x - 3)**2 + (y - 3)**2

# Inequality constraints use the convention: constraint(x) >= 0
constraints = [
    {"type": "ineq", "fun": lambda xy: 4 - xy[0] - xy[1]},  # x + y <= 4
    {"type": "ineq", "fun": lambda xy: xy[0]},                # x >= 0
    {"type": "ineq", "fun": lambda xy: xy[1]},                # y >= 0
]

result = minimize(objective, x0=[0.0, 0.0], method="SLSQP", constraints=constraints)

print(f"Optimal point: x = {result.x[0]:.4f}, y = {result.x[1]:.4f}")
print(f"Optimal value: {result.fun:.4f}")

Output:

Optimal point: x = 2.0000, y = 2.0000
Optimal value: 2.0000

This matches our hand calculation from Example 3B. The solver internally uses KKT conditions to find and verify the solution.

For problems where you need the multiplier values, the result object does not expose them directly in scipy. For full KKT analysis, you can use CVXPY for convex problems:

import cvxpy as cp

x = cp.Variable()
y = cp.Variable()

objective = cp.Minimize((x - 3)**2 + (y - 3)**2)
constraints = [x + y <= 4, x >= 0, y >= 0]

problem = cp.Problem(objective, constraints)
problem.solve()

print(f"Optimal point: x = {x.value:.4f}, y = {y.value:.4f}")
print(f"Optimal value: {problem.value:.4f}")
print(f"Multiplier for x + y <= 4: {constraints[0].dual_value:.4f}")
print(f"Multiplier for x >= 0:     {constraints[1].dual_value:.4f}")
print(f"Multiplier for y >= 0:     {constraints[2].dual_value:.4f}")

Output:

Optimal point: x = 2.0000, y = 2.0000
Optimal value: 2.0000
Multiplier for x + y <= 4: 2.0000
Multiplier for x >= 0:     0.0000
Multiplier for y >= 0:     0.0000

The multipliers confirm what we found by hand: $\mu_1 = 2$ for the active constraint, and $\mu_2 = \mu_3 = 0$ for the inactive ones.

Summary

Checking all KKT conditions

flowchart TD
  A["Candidate point x*"] --> B["1. Stationarity:<br/>gradient of Lagrangian = 0?"]
  B --> C["2. Primal feasibility:<br/>all constraints satisfied?"]
  C --> D["3. Dual feasibility:<br/>multipliers mu_i >= 0?"]
  D --> E["4. Complementary slackness:<br/>mu_i * g_i = 0 for all i?"]
  E --> F{"All four pass?"}
  F -->|"Yes + convex problem"| G["Global optimum confirmed"]
  F -->|"Yes + non-convex"| H["Candidate for local optimum"]
  F -->|"No"| I["Not a KKT point"]
  style G fill:#ccffcc
  style I fill:#ffcccc

The KKT conditions boil down to four checks:

Condition	What it says	Why it matters
Stationarity	Gradients balance out	The point is a critical point of the Lagrangian
Primal feasibility	All constraints satisfied	The point is actually feasible
Dual feasibility	$\mu_i \geq 0$	Inequality multipliers have the right sign
Complementary slackness	$\mu_i g_i = 0$	Inactive constraints have zero multipliers

For convex problems, these four conditions are the whole story. Find a point that satisfies them, and you have the global optimum.

What comes next

The KKT conditions tell you what the solution looks like, but they do not directly give you an algorithm to find it. For large-scale problems where you cannot solve the KKT system by hand, you need iterative methods. Penalty and barrier methods convert constrained problems into a sequence of unconstrained problems that you can solve with standard tools like gradient descent. They are the bridge between the theory of KKT and practical large-scale optimization.

← Back to all series