Mar 8, 2026 · 16 min read · ML Optimization

Optimality conditions: second order

In this series (18 parts)

First-order conditions tell you where the critical points are. Second-order conditions tell you what kind of critical point you are looking at. The tool is the Hessian matrix, and the key property is positive definiteness.

The intuition: what kind of critical point is this?

First-order conditions find flat spots where the gradient is zero. But a flat spot could be a valley floor, a hilltop, or a mountain pass. Second-order conditions tell you which one.

Consider three simple functions and their critical points at $x = 0$ :

Function	$f''(0)$	Shape at origin	Classification
$f(x) = x^2$	2 (positive)	Curves up like a bowl	Minimum
$f(x) = -x^2$	-2 (negative)	Curves down like a hill	Maximum
$f(x) = x^3$	0	Flat inflection	Neither (saddle in higher dims)

Three types of critical points

graph LR
  A["Bowl (minimum): curves up in all directions"] --- B["Hill (maximum): curves down in all directions"]
  B --- C["Saddle point: curves up in some directions, down in others"]

The Hessian matrix captures curvature in every direction simultaneously.

Three types of critical points at x = 0: minimum (x^2 curves up), maximum (-x^2 curves down), and inflection (x^3 is flat)

Positive curvature everywhere means you are at a valley floor. Negative curvature everywhere means a hilltop. Mixed curvature means a saddle, like the middle of a horse saddle that curves up front-to-back and down side-to-side.

Now let’s make this precise.

Prerequisites

You should be comfortable with:

First-order optimality conditions
Hessians and Jacobians, including how to compute second-order partial derivatives

The Hessian matrix

For a twice-differentiable function $f: \mathbb{R}^n \to \mathbb{R}$ , the Hessian is the $n \times n$ matrix of second partial derivatives:

$\nabla^2 f(\mathbf{x}) = H = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}$

If $f$ has continuous second derivatives (which is almost always the case in ML), the Hessian is symmetric: $\frac{\partial^2 f}{\partial x_i \partial x_j} = \frac{\partial^2 f}{\partial x_j \partial x_i}$ .

Positive definiteness

A symmetric matrix $H$ is:

Positive definite ( $H \succ 0$ ) if $\mathbf{v}^T H \mathbf{v} > 0$ for all $\mathbf{v} \neq \mathbf{0}$
Positive semidefinite ( $H \succeq 0$ ) if $\mathbf{v}^T H \mathbf{v} \geq 0$ for all $\mathbf{v}$
Negative definite ( $H \prec 0$ ) if $\mathbf{v}^T H \mathbf{v} < 0$ for all $\mathbf{v} \neq \mathbf{0}$
Indefinite if $\mathbf{v}^T H \mathbf{v}$ takes both positive and negative values

An equivalent check uses eigenvalues:

Positive definite: all eigenvalues are positive
Positive semidefinite: all eigenvalues are non-negative
Negative definite: all eigenvalues are negative
Indefinite: eigenvalues have mixed signs

For $2 \times 2$ matrices, there is a shortcut. A symmetric matrix $\begin{bmatrix} a & b \\ b & c \end{bmatrix}$ is:

Positive definite if $a > 0$ and $ac - b^2 > 0$
Negative definite if $a < 0$ and $ac - b^2 > 0$
Indefinite if $ac - b^2 < 0$

The quantity $ac - b^2$ is the determinant of the matrix.

Second-order optimality conditions

Second-order necessary condition. If $\mathbf{x}^*$ is a local minimum of $f$ , then:

$\nabla f(\mathbf{x}^*) = \mathbf{0}$ (first-order condition)
$\nabla^2 f(\mathbf{x}^*) \succeq 0$ (Hessian is positive semidefinite)

Second-order sufficient condition. If at a point $\mathbf{x}^*$ :

$\nabla f(\mathbf{x}^*) = \mathbf{0}$
$\nabla^2 f(\mathbf{x}^*) \succ 0$ (Hessian is positive definite)

then $\mathbf{x}^*$ is a strict local minimum.

The gap between “necessary” and “sufficient” is the semidefinite case. When the Hessian is positive semidefinite but not positive definite (some eigenvalue is exactly zero), the second-order test is inconclusive. You need higher-order information.

Classification summary

At a critical point $\mathbf{x}^*$ where $\nabla f = \mathbf{0}$ :

Hessian at $\mathbf{x}^*$	Conclusion
Positive definite	Strict local minimum ✓
Negative definite	Strict local maximum
Indefinite	Saddle point
Semidefinite (singular)	Inconclusive, need more analysis

Classifying a critical point using the Hessian eigenvalues

graph TD
  A["Find critical point: gradient = 0"] --> B["Compute Hessian at that point"]
  B --> C["Find eigenvalues of Hessian"]
  C --> D["All positive?"]
  D -- Yes --> E["Local minimum"]
  D -- No --> F["All negative?"]
  F -- Yes --> G["Local maximum"]
  F -- No --> H["Mixed signs?"]
  H -- Yes --> I["Saddle point"]
  H -- No --> J["Some zero eigenvalues: inconclusive"]

Eigenvalue sign to classification

graph LR
  A["All eigenvalues > 0"] --> B["Minimum: bowl shape"]
  C["All eigenvalues < 0"] --> D["Maximum: hilltop"]
  E["Mixed positive and negative"] --> F["Saddle point: horse saddle"]

Example 1: Classify the critical point of a quadratic

Consider:

$f(x, y) = 2x^2 + 3y^2 - 4x + 6y + 10$

Step 1. Find the gradient and set it to zero:

$\frac{\partial f}{\partial x} = 4x - 4 = 0 \quad \Rightarrow \quad x = 1$

$\frac{\partial f}{\partial y} = 6y + 6 = 0 \quad \Rightarrow \quad y = -1$

Critical point: $(1, -1)$ .

Step 2. Compute the Hessian:

$H = \begin{bmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} \end{bmatrix} = \begin{bmatrix} 4 & 0 \\ 0 & 6 \end{bmatrix}$

Step 3. Check positive definiteness using the $2 \times 2$ shortcut:

$a = 4 > 0 \quad \checkmark$

$\det(H) = 4 \cdot 6 - 0^2 = 24 > 0 \quad \checkmark$

The Hessian is positive definite.

Step 4. Confirm with eigenvalues. Since $H$ is diagonal, the eigenvalues are $4$ and $6$ , both positive.

Conclusion: $(1, -1)$ is a strict local minimum with $f(1, -1) = 2 + 3 - 4 - 6 + 10 = 5$ . ✓

Since this is a quadratic with positive definite Hessian, it is also convex, so this is the global minimum.

import numpy as np

H = np.array([[4, 0], [0, 6]])
eigenvalues = np.linalg.eigvalsh(H)
print(f"Eigenvalues: {eigenvalues}")   # [4. 6.]
print(f"Positive definite: {all(eigenvalues > 0)}")  # True

f = lambda x, y: 2*x**2 + 3*y**2 - 4*x + 6*y + 10
print(f"f(1, -1) = {f(1, -1)}")  # 5

Example 2: A saddle point

Consider:

$f(x, y) = x^2 - 4xy + y^2$

Step 1. Gradient:

$\frac{\partial f}{\partial x} = 2x - 4y = 0$

$\frac{\partial f}{\partial y} = -4x + 2y = 0$

From the first equation: $x = 2y$ . Substituting into the second: $-8y + 2y = -6y = 0$ , so $y = 0$ and $x = 0$ .

Critical point: $(0, 0)$ with $f(0, 0) = 0$ .

Step 2. Hessian:

$H = \begin{bmatrix} 2 & -4 \\ -4 & 2 \end{bmatrix}$

Step 3. Check definiteness:

$a = 2 > 0$

$\det(H) = 2 \cdot 2 - (-4)^2 = 4 - 16 = -12 < 0$

The determinant is negative, so the Hessian is indefinite.

Step 4. Verify with eigenvalues:

$\det(H - \lambda I) = (2 - \lambda)^2 - 16 = 0$

$2 - \lambda = \pm 4$

$\lambda_1 = 6, \quad \lambda_2 = -2$

One positive, one negative: indefinite. ✓

Conclusion: $(0, 0)$ is a saddle point. The function curves upward in the direction of the eigenvector for $\lambda = 6$ and downward in the direction of the eigenvector for $\lambda = -2$ .

Verification with specific directions:

Along $\mathbf{d} = (1, 1)$ : $f(t, t) = t^2 - 4t^2 + t^2 = -2t^2$ . This goes down, so the function decreases.

Along $\mathbf{d} = (1, -1)$ : $f(t, -t) = t^2 + 4t^2 + t^2 = 6t^2$ . This goes up, so the function increases.

import numpy as np

H = np.array([[2, -4], [-4, 2]])
eigenvalues = np.linalg.eigvalsh(H)
print(f"Eigenvalues: {eigenvalues}")  # [-2.  6.]
print(f"Indefinite (mixed signs): {eigenvalues[0] * eigenvalues[-1] < 0}")  # True

f = lambda x, y: x**2 - 4*x*y + y**2
print(f"f(0.1, 0.1) = {f(0.1, 0.1)}")    # -0.02 (below 0)
print(f"f(0.1, -0.1) = {f(0.1, -0.1)}")   # 0.06 (above 0)

Contour plot of the saddle surface f(x,y) = x^2 - y^2. The hyperbolic contours show curvature is positive along x and negative along y.

Example 3: A function with two critical points of different types

Consider:

$f(x, y) = x^3 + y^3 - 3x - 3y$

Step 1. Gradient:

$\frac{\partial f}{\partial x} = 3x^2 - 3 = 0 \quad \Rightarrow \quad x = \pm 1$

$\frac{\partial f}{\partial y} = 3y^2 - 3 = 0 \quad \Rightarrow \quad y = \pm 1$

This gives four critical points: $(1,1)$ , $(1,-1)$ , $(-1,1)$ , $(-1,-1)$ .

Step 2. Hessian:

$H(x,y) = \begin{bmatrix} 6x & 0 \\ 0 & 6y \end{bmatrix}$

Step 3. Evaluate the Hessian at each critical point:

At $(1, 1)$ :

$H = \begin{bmatrix} 6 & 0 \\ 0 & 6 \end{bmatrix}$

Eigenvalues: $6, 6$ . Both positive. Strict local minimum. ✓

$f(1, 1) = 1 + 1 - 3 - 3 = -4$

At $(-1, -1)$ :

$H = \begin{bmatrix} -6 & 0 \\ 0 & -6 \end{bmatrix}$

Eigenvalues: $-6, -6$ . Both negative. Strict local maximum.

$f(-1, -1) = -1 - 1 + 3 + 3 = 4$

At $(1, -1)$ :

$H = \begin{bmatrix} 6 & 0 \\ 0 & -6 \end{bmatrix}$

Eigenvalues: $6, -6$ . Mixed signs. Saddle point.

$f(1, -1) = 1 - 1 - 3 + 3 = 0$

At $(-1, 1)$ :

$H = \begin{bmatrix} -6 & 0 \\ 0 & 6 \end{bmatrix}$

Eigenvalues: $-6, 6$ . Mixed signs. Saddle point.

$f(-1, 1) = -1 + 1 + 3 - 3 = 0$

Summary of results:

Critical point	$f$ value	Eigenvalues	Type
$(1, 1)$	$-4$	$6, 6$	Local minimum ✓
$(-1, -1)$	$4$	$-6, -6$	Local maximum
$(1, -1)$	$0$	$6, -6$	Saddle point
$(-1, 1)$	$0$	$-6, 6$	Saddle point

import numpy as np

f = lambda x, y: x**3 + y**3 - 3*x - 3*y

critical_points = [(1,1), (-1,-1), (1,-1), (-1,1)]
for (x, y) in critical_points:
    H = np.array([[6*x, 0], [0, 6*y]])
    eigs = np.linalg.eigvalsh(H)
    val = f(x, y)
    if all(eigs > 0):
        kind = "local min"
    elif all(eigs < 0):
        kind = "local max"
    else:
        kind = "saddle"
    print(f"({x:+d},{y:+d}): f={val:+.0f}, eigs={eigs}, {kind}")

The inconclusive case

When the Hessian is positive semidefinite but not positive definite (at least one zero eigenvalue), the second-order test tells you nothing. You need to look at higher-order derivatives or use other methods.

Classic example: $f(x) = x^4$ . At $x = 0$ :

$f'(0) = 0$ ✓ (critical point)
$f''(0) = 0$ (inconclusive)

But $x = 0$ is clearly a minimum because $x^4 \geq 0$ for all $x$ . The second-order test just cannot confirm it.

Another example: $f(x) = x^3$ . At $x = 0$ :

$f'(0) = 0$ ✓ (critical point)
$f''(0) = 0$ (inconclusive)

Here $x = 0$ is neither a minimum nor a maximum; it is an inflection point.

Connection to curvature and ML

Why saddle points matter more than local minima

In high-dimensional optimization (like training neural networks), saddle points are far more common than local minima. A critical point is a local minimum only if the Hessian has all positive eigenvalues. In a space with millions of dimensions, the probability that all eigenvalues happen to be positive is very small. Most critical points in neural network loss surfaces are saddle points.

This is actually good news. Gradient descent and its variants (SGD, Adam) can escape saddle points because the noise in stochastic gradients pushes the parameters along directions of negative curvature. Pure gradient descent (without noise) can get stuck at saddle points because the gradient is zero there.

Condition number

The ratio of the largest to smallest eigenvalue of the Hessian, $\kappa = \lambda_{\max} / \lambda_{\min}$ , is the condition number. A large condition number means the function is steep in some directions and flat in others. This makes optimization harder because a single step size cannot work well in all directions. Newton’s method handles this by using the Hessian to scale the step, which is why it converges faster than gradient descent on ill-conditioned problems.

Summary

Condition	Meaning
$H \succ 0$ (positive definite)	Local minimum ✓
$H \prec 0$ (negative definite)	Local maximum
$H$ indefinite	Saddle point
$H$ semidefinite	Inconclusive
High condition number	Hard for gradient descent

The Hessian test gives you a definitive answer at most critical points. When it is inconclusive, you need more sophisticated tools. For practical ML, the main takeaway is that saddle points dominate high-dimensional landscapes, and stochastic methods handle them naturally.

What comes next

Now that you can find and classify critical points, the next question is: how do you get there? Line search methods are the building block of iterative optimization algorithms, controlling how far you step in a chosen direction.

← Back to all series