Apr 16, 2026 · 18 min read · ML / Math

Supervised Learning - Learning from Labeled Data

In this series (5 parts)

Introduction to Machine Learning
Supervised Learning - Learning from Labeled Data
Regression - Predicting Continuous Values
Classification - Predicting Categories
Unsupervised Learning - Finding Hidden Patterns

In the previous post, we defined machine learning as teaching computers to learn from data. Now let’s look at the most common and practical approach: supervised learning.

What is Supervised Learning?

Supervised learning is when you train a model using labeled data - meaning each training example comes with both the input and the correct answer.

Think of it like a student learning with an answer key:

flowchart LR
  A["Training Data
(inputs + labels)"] --> B["Learning Algorithm"]
  B --> C["Trained Model"]
  C --> D["New Input"]
  D --> E["Prediction"]
  style B fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style C fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38

The model sees thousands of examples, compares its predictions to the correct answers, and adjusts itself to minimize errors. Once trained, it can make predictions on data it’s never seen before.

A Real-World Dataset

Let’s work with a concrete example: predicting house prices. Here’s a sample of our training data:

Size (sq ft)	Bedrooms	Age (years)	Garage	Neighborhood	Price ($)
1,400	3	15	Yes	Suburban	285,000
850	1	30	No	Urban	165,000
2,200	4	5	Yes	Suburban	425,000
1,100	2	20	No	Urban	210,000
3,000	5	2	Yes	Rural	380,000
1,600	3	10	Yes	Urban	340,000
900	2	25	No	Suburban	195,000
2,500	4	8	Yes	Suburban	465,000

In this dataset:

Features (inputs): Size, Bedrooms, Age, Garage, Neighborhood
Label (target): Price

The model’s job is to learn the relationship between features and the label so it can predict the price of a house it’s never seen.

The Two Flavors of Supervised Learning

Every supervised learning problem falls into one of two categories:

flowchart TD
  SL["Supervised Learning"] --> REG["Regression"]
  SL --> CLS["Classification"]
  REG --> R1["Predict a continuous number"]
  REG --> R2["Examples: price, temperature, age"]
  CLS --> C1["Predict a category/class"]
  CLS --> C2["Examples: spam/not spam, cat/dog"]
  style SL fill:#f3e8fd,stroke:#5b1a7a,color:#5b1a7a
  style REG fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style CLS fill:#e8f4fd,stroke:#1a5276,color:#1a5276

Aspect	Regression	Classification
Output type	Continuous number	Discrete category
Example output	$285,000	”Spam” or “Not Spam”
Loss function	Mean Squared Error	Cross-entropy
Evaluation	MAE, RMSE, R²	Accuracy, Precision, Recall, F1
Algorithms	Linear Regression, Random Forest	Logistic Regression, SVM, Decision Tree

We’ll cover each in detail in the next two posts. For now, let’s understand the full supervised learning pipeline.

The Training Process

Step 1: Split the Data

You never train and evaluate on the same data. A common split:

pie title Data Split
  "Training Set (70%)" : 70
  "Validation Set (15%)" : 15
  "Test Set (15%)" : 15

Training set: The model learns from this
Validation set: Used during training to tune hyperparameters
Test set: Final evaluation - the model never sees this until the end

Step 2: The Model Learns

During training, the model:

Takes a batch of training examples
Makes predictions
Computes the loss (how far off the predictions are)
Adjusts its internal parameters to reduce the loss
Repeats for many iterations (epochs)

flowchart TD
  A["Input Batch"] --> B["Model Prediction"]
  B --> C["Compare with True Labels"]
  C --> D["Compute Loss"]
  D --> E["Update Model Parameters"]
  E --> A
  style D fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a
  style E fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38

Step 3: Evaluate

After training, we check how well the model generalizes to unseen data using the test set.

Feature Types

Real-world data comes in different types, and the model needs to handle each:

Feature Type	Description	Example	How to Handle
Numerical	Continuous or discrete numbers	Size: 1,400 sq ft	Use directly (may need scaling)
Categorical	Discrete labels	Neighborhood: “Urban”	One-hot encode or label encode
Binary	Yes/No, True/False	Garage: Yes	Convert to 0/1
Ordinal	Categories with order	Level: Low, Med, High	Label encode with order

One-Hot Encoding Example

The “Neighborhood” column can’t be fed as text to a model. We convert it:

Neighborhood	is_Urban	is_Suburban	is_Rural
Urban	1	0	0
Suburban	0	1	0
Rural	0	0	1

Overfitting vs Underfitting

The most important concept in supervised learning: your model must generalize well.

flowchart LR
  UF["Underfitting
(too simple)"] --- GF["Good Fit
(just right)"] --- OF["Overfitting
(too complex)"]
  style UF fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a
  style GF fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style OF fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a

State	Training Accuracy	Test Accuracy	Problem
Underfitting	Low	Low	Model too simple, misses patterns
Good fit	High	High	Model captures real patterns
Overfitting	Very High	Low	Model memorized training data

How to Detect Overfitting

If your training error is very low but validation error is high, you’re overfitting.

How to Fix It

More data - more examples reduce overfitting
Simpler model - fewer parameters, less complexity
Regularization - add penalty for overly complex models
Cross-validation - evaluate on multiple data splits
Early stopping - stop training when validation error starts increasing

Real-Life Supervised Learning Applications

Application	Input Features	Label	Type
House price prediction	Size, location, age	Price ($)	Regression
Email spam detection	Words, sender, links	Spam / Not Spam	Classification
Medical diagnosis	Symptoms, lab results	Disease / Healthy	Classification
Stock price forecasting	Historical prices, volume	Future price	Regression
Customer churn	Usage, tenure, complaints	Will leave? Yes/No	Classification
Weather forecasting	Temperature, humidity, wind	Tomorrow’s temp	Regression
Loan default prediction	Income, credit score, debt	Default? Yes/No	Classification
Image recognition	Pixel values	Object label	Classification

The Bias-Variance Tradeoff

Every supervised model faces a fundamental tension:

Bias: Error from overly simple assumptions. High bias → underfitting.
Variance: Error from sensitivity to training data. High variance → overfitting.

	Low Variance	High Variance
Low Bias	✅ Ideal (good generalization)	⚠️ Overfitting
High Bias	⚠️ Underfitting	❌ Worst case (both problems)

The goal is to find the sweet spot - a model complex enough to capture real patterns but not so complex that it memorizes noise.

Evaluation Metrics

Different tasks need different metrics:

For Regression

MAE (Mean Absolute Error): Average magnitude of errors
RMSE (Root Mean Squared Error): Penalizes large errors more
R² (R-squared): How much variance the model explains (0 to 1)

For Classification

Accuracy: % of correct predictions (can be misleading with imbalanced data)
Precision: Of predicted positives, how many are actually positive?
Recall: Of actual positives, how many did we catch?
F1 Score: Harmonic mean of precision and recall

What’s Next?

Now that you understand the supervised learning framework, we’ll dive deep into the two subtypes:

flowchart LR
  A["✅ Intro to ML"] --> B["✅ Supervised Learning"]
  B --> C["Regression"]
  C --> D["Classification"]
  D --> E["Unsupervised Learning"]
  style A fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style B fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style C fill:#e8f4fd,stroke:#1a5276,color:#1a5276

In the next post, we’ll explore Regression - predicting continuous values. We’ll build a linear regression model step by step, derive the math, and visualize how the model fits data.

See you in Part 3.

← Back to all series