KKT conditions
In this series (18 parts)
- What is optimization and why ML needs it
- Convex sets and convex functions
- Optimality conditions: first order
- Optimality conditions: second order
- Line search methods
- Least squares: the closed-form solution
- Steepest descent (gradient descent)
- Newton's method for optimization
- Quasi-Newton methods: BFGS and L-BFGS
- Conjugate gradient methods
- Constrained optimization and Lagrangian duality
- KKT conditions
- Penalty and barrier methods
- Interior point methods
- The simplex method
- Frank-Wolfe method
- Optimization in dynamic programming and optimal control
- Stochastic gradient descent and variants
Prerequisites: This article builds directly on Lagrangian duality. You should be comfortable with Lagrange multipliers for equality-constrained problems before continuing.
A factory with resource constraints
A factory makes two products, A and B. Each unit of A earns $5 profit, each unit of B earns $4. But resources are limited:
| Resource | A uses per unit | B uses per unit | Available |
|---|---|---|---|
| Material | 2 kg | 1 kg | 10 kg |
| Labor | 1 hr | 2 hr | 8 hr |
The factory wants to maximize profit. The unconstrained answer would be “make infinitely many of both,” but the constraints fence in the possibilities. Only certain combinations of A and B are feasible.
The optimal point sits on the boundary of the feasible region, where the profit contours just touch the constraint walls. KKT conditions tell you exactly when a point is optimal for a constrained problem. They are a checklist: if all four conditions pass, you have found the answer (at least for convex problems).
Feasible region with objective contours
graph TD A["Feasible region:<br/>all constraint inequalities satisfied"] --> B["Objective contours<br/>sweep across the region"] B --> C["Optimal point: where the best contour<br/>just touches the feasible boundary"] C --> D["KKT conditions:<br/>four checks that confirm optimality"] style C fill:#ccffcc style D fill:#ccffcc
The relationship: unconstrained to equality to inequality constraints
graph LR A["Unconstrained:<br/>set gradient = 0"] --> B["Equality constraints:<br/>Lagrange multipliers"] B --> C["Inequality constraints:<br/>KKT conditions"] A -->|"Simplest case"| D["No constraints to worry about"] B -->|"Adds multipliers for equalities"| E["Gradient of Lagrangian = 0"] C -->|"Adds sign constraints<br/>and complementary slackness"| F["Four conditions to check"]
KKT conditions are the most general first-order optimality conditions for smooth constrained problems. Everything else is a special case.
From equality to inequality constraints
The Lagrangian method handles equality constraints cleanly. You set up a Lagrangian, take gradients, set them to zero, and solve. But real optimization problems rarely come with only equalities. Budget limits, capacity bounds, non-negativity requirements: these are all inequalities.
The general constrained optimization problem looks like this:
The Karush-Kuhn-Tucker (KKT) conditions extend the Lagrangian framework to handle these inequality constraints. They give you a checklist: if a point passes all four conditions, you know it is a candidate for optimality. For convex problems, passing the checklist is enough to guarantee optimality.
The four KKT conditions
Given the problem above, form the Lagrangian:
Here are the multipliers for inequality constraints and are the multipliers for equality constraints. A point with multipliers and satisfies the KKT conditions if all four of the following hold:
1. Stationarity
The gradient of the objective plus the weighted gradients of all constraints must sum to zero. This is the same idea as setting the gradient of the Lagrangian to zero.
2. Primal feasibility
The solution must actually satisfy all the constraints. No cheating.
3. Dual feasibility
The multipliers on inequality constraints must be non-negative. This is the key difference from equality constraints, where can be any real number. The sign restriction on reflects the fact that inequality constraints can only “push” the solution in one direction.
4. Complementary slackness
For each inequality constraint, either the constraint is active () or its multiplier is zero (), or both. This is the most interesting condition, and it deserves its own section.
Complementary slackness explained
Think about it this way. If a constraint is inactive, meaning , the solution is strictly inside the feasible region with respect to that constraint. The constraint is not “touching” the solution. It might as well not exist. So its multiplier should be zero, because removing that constraint would not change the optimal value.
On the other hand, if a constraint is active, meaning , the solution sits right on the boundary of that constraint. The constraint is binding. Its multiplier can be positive, reflecting the “cost” of that constraint being there.
The product captures both cases in one equation. It is called complementary slackness because “slack” () and a positive multiplier () cannot coexist.
Complementary slackness: active vs inactive constraints
graph TD
A["Constraint i"] --> B{"Is constraint active?<br/>g_i = 0?"}
B -->|"Yes: active, tight"| C["Constraint is binding<br/>Multiplier mu_i can be positive"]
B -->|"No: inactive, slack"| D["Constraint has room<br/>Multiplier mu_i must be 0"]
C --> E["mu_i * g_i = mu_i * 0 = 0 ✓"]
D --> F["mu_i * g_i = 0 * g_i = 0 ✓"]
style C fill:#ffcccc
style D fill:#ccffcc
This gives you a powerful solving strategy: for each inequality constraint, consider two cases. Either the constraint is active, or . Try each combination and see which one leads to a consistent solution.
Interpreting the multipliers
The multiplier tells you the sensitivity of the optimal value to constraint . If you relax the constraint slightly, changing to , the optimal objective value changes by approximately . A large multiplier means the constraint is “expensive”: it is costing you a lot in objective value.
If , the constraint is not affecting the solution at all. You could remove it and nothing would change.
Worked Example 1: Minimizing distance with a linear constraint
Problem: Minimize subject to .
Contour plot of f(x,y) = x^2 + y^2 with the inequality constraint x + y >= 2. The shaded region is feasible. The KKT point (1,1) lies on the constraint boundary where the gradient of f is proportional to the gradient of the constraint.
Step 1: Rewrite in standard form. The KKT framework uses , so rewrite the constraint:
Step 2: Write the KKT conditions.
Stationarity:
So: and .
Primal feasibility: , i.e., .
Dual feasibility: .
Complementary slackness: .
Step 3: Consider two cases.
Case A: Constraint inactive ().
From stationarity: and , giving , .
Check primal feasibility: ? No. ✗
The unconstrained minimum violates the constraint. This case fails.
Case B: Constraint active (, so ).
From stationarity: and , so .
Substituting into : , so , .
From stationarity: .
Step 4: Verify all four conditions at , .
- ✓ Stationarity: and .
- ✓ Primal feasibility: .
- ✓ Dual feasibility: .
- ✓ Complementary slackness: .
The optimal value is .
Worked Example 2: A 1D bound constraint
Problem: Minimize subject to .
This is a simple one-dimensional problem. The objective wants , but the constraint says cannot exceed 2.
Step 1: Standard form. The constraint is already in standard form: .
Step 2: Write the KKT conditions.
Stationarity:
Primal feasibility: .
Dual feasibility: .
Complementary slackness: .
Step 3: Consider two cases.
Case A: Constraint inactive ().
From stationarity: , so .
Check primal feasibility: ? No. ✗
The unconstrained optimum is infeasible.
Case B: Constraint active ().
From stationarity: , so , giving .
Step 4: Verify all four conditions at , .
- ✓ Stationarity: .
- ✓ Primal feasibility: .
- ✓ Dual feasibility: .
- ✓ Complementary slackness: .
The optimal value is .
The multiplier tells you something concrete. If you relax the constraint to , the optimal value would decrease by approximately , from to about (since ).
Worked Example 3: A constrained quadratic program
Part A: Unconstrained minimum is feasible
Problem: Minimize subject to , , .
Step 1: Standard form. Rewrite all constraints as :
Step 2: Check the unconstrained minimum. gives .
Is feasible?
- ✓
- ✓
- ✓
Yes. The unconstrained minimum lies inside the feasible region. So no constraint is active, and all multipliers are zero: .
Step 3: Verify KKT conditions at with .
- ✓ Stationarity: .
- ✓ Primal feasibility: all constraints satisfied as shown above.
- ✓ Dual feasibility: all .
- ✓ Complementary slackness: , , .
Optimal value: .
Part B: Unconstrained minimum is infeasible
Problem: Minimize subject to , , .
The unconstrained minimum is , but . The point violates the first constraint.
Step 1: Which constraints are active? By symmetry (the objective and the binding constraint treat and identically), expect at the solution. If and , then . Both and are satisfied but not active.
So assume is active () and .
Step 2: KKT stationarity with only active.
From the first equation: .
From the second equation: .
Subtracting: , so .
With : .
Substituting back: , so , giving .
Step 3: Verify all KKT conditions at , , .
- ✓ Stationarity: for both components.
- ✓ Primal feasibility: ; ; .
- ✓ Dual feasibility: , , .
- ✓ Complementary slackness: ; ; .
Optimal value: .
Again, tells you that relaxing the constraint to would decrease the optimal value by about .
KKT for convex vs non-convex problems
For convex problems (convex objective, convex inequality constraints, affine equality constraints), the KKT conditions are both necessary and sufficient for global optimality. If you find a point satisfying all four conditions, you have the global minimum. No further checking needed.
All three examples above were convex, so verifying KKT was enough to confirm optimality.
For non-convex problems, KKT conditions are only necessary (under a technical condition called constraint qualification). A point can satisfy all four KKT conditions and still be a local minimum, a saddle point, or even a local maximum. You would need second-order conditions involving the Hessian to distinguish these cases.
Constraint qualification is a regularity condition that ensures the KKT conditions actually apply. The most common one is the Linear Independence Constraint Qualification (LICQ): the gradients of the active constraints at must be linearly independent. In practice, LICQ holds for most well-posed problems. When it fails, you can get situations where the KKT conditions have no solution even though an optimal point exists.
Connection to machine learning
KKT conditions are not just theoretical. They appear directly in several ML algorithms.
Support vector machines. The SVM optimization problem minimizes subject to the constraint that each training point is correctly classified with a margin: . The KKT conditions for this problem tell you that:
- Most data points have (they are not support vectors).
- The support vectors are the points where the constraint is active ().
Complementary slackness is what makes SVMs sparse: only a few training points actually determine the decision boundary.
Regularized regression. L1 regularization (lasso) can be formulated as a constrained optimization problem: minimize the squared error subject to . The KKT conditions explain why lasso produces sparse solutions. When a coefficient’s constraint is not active, the corresponding multiplier is zero, and the coefficient can be nonzero. When the constraint binds, complementary slackness forces certain coefficients to exactly zero.
Constrained neural networks. Modern applications sometimes add constraints to neural network training, such as fairness constraints or robustness constraints. These are solved using KKT-based methods, often through augmented Lagrangian or penalty approaches.
Python implementation
You can solve constrained optimization problems in Python using scipy.optimize.minimize with the SLSQP method, which handles inequality constraints directly.
import numpy as np
from scipy.optimize import minimize
# Example 3B: Minimize (x-3)^2 + (y-3)^2
# subject to x + y <= 4, x >= 0, y >= 0
def objective(xy):
x, y = xy
return (x - 3)**2 + (y - 3)**2
# Inequality constraints use the convention: constraint(x) >= 0
constraints = [
{"type": "ineq", "fun": lambda xy: 4 - xy[0] - xy[1]}, # x + y <= 4
{"type": "ineq", "fun": lambda xy: xy[0]}, # x >= 0
{"type": "ineq", "fun": lambda xy: xy[1]}, # y >= 0
]
result = minimize(objective, x0=[0.0, 0.0], method="SLSQP", constraints=constraints)
print(f"Optimal point: x = {result.x[0]:.4f}, y = {result.x[1]:.4f}")
print(f"Optimal value: {result.fun:.4f}")
Output:
Optimal point: x = 2.0000, y = 2.0000
Optimal value: 2.0000
This matches our hand calculation from Example 3B. The solver internally uses KKT conditions to find and verify the solution.
For problems where you need the multiplier values, the result object does not expose them directly in scipy. For full KKT analysis, you can use CVXPY for convex problems:
import cvxpy as cp
x = cp.Variable()
y = cp.Variable()
objective = cp.Minimize((x - 3)**2 + (y - 3)**2)
constraints = [x + y <= 4, x >= 0, y >= 0]
problem = cp.Problem(objective, constraints)
problem.solve()
print(f"Optimal point: x = {x.value:.4f}, y = {y.value:.4f}")
print(f"Optimal value: {problem.value:.4f}")
print(f"Multiplier for x + y <= 4: {constraints[0].dual_value:.4f}")
print(f"Multiplier for x >= 0: {constraints[1].dual_value:.4f}")
print(f"Multiplier for y >= 0: {constraints[2].dual_value:.4f}")
Output:
Optimal point: x = 2.0000, y = 2.0000
Optimal value: 2.0000
Multiplier for x + y <= 4: 2.0000
Multiplier for x >= 0: 0.0000
Multiplier for y >= 0: 0.0000
The multipliers confirm what we found by hand: for the active constraint, and for the inactive ones.
Summary
Checking all KKT conditions
flowchart TD
A["Candidate point x*"] --> B["1. Stationarity:<br/>gradient of Lagrangian = 0?"]
B --> C["2. Primal feasibility:<br/>all constraints satisfied?"]
C --> D["3. Dual feasibility:<br/>multipliers mu_i >= 0?"]
D --> E["4. Complementary slackness:<br/>mu_i * g_i = 0 for all i?"]
E --> F{"All four pass?"}
F -->|"Yes + convex problem"| G["Global optimum confirmed"]
F -->|"Yes + non-convex"| H["Candidate for local optimum"]
F -->|"No"| I["Not a KKT point"]
style G fill:#ccffcc
style I fill:#ffcccc
The KKT conditions boil down to four checks:
| Condition | What it says | Why it matters |
|---|---|---|
| Stationarity | Gradients balance out | The point is a critical point of the Lagrangian |
| Primal feasibility | All constraints satisfied | The point is actually feasible |
| Dual feasibility | Inequality multipliers have the right sign | |
| Complementary slackness | Inactive constraints have zero multipliers |
For convex problems, these four conditions are the whole story. Find a point that satisfies them, and you have the global optimum.
What comes next
The KKT conditions tell you what the solution looks like, but they do not directly give you an algorithm to find it. For large-scale problems where you cannot solve the KKT system by hand, you need iterative methods. Penalty and barrier methods convert constrained problems into a sequence of unconstrained problems that you can solve with standard tools like gradient descent. They are the bridge between the theory of KKT and practical large-scale optimization.