Search…
ML from Scratch · Part 2

Supervised Learning - Learning from Labeled Data

In this series (5 parts)
  1. Introduction to Machine Learning
  2. Supervised Learning - Learning from Labeled Data
  3. Regression - Predicting Continuous Values
  4. Classification - Predicting Categories
  5. Unsupervised Learning - Finding Hidden Patterns

In the previous post, we defined machine learning as teaching computers to learn from data. Now let’s look at the most common and practical approach: supervised learning.

What is Supervised Learning?

Supervised learning is when you train a model using labeled data - meaning each training example comes with both the input and the correct answer.

Think of it like a student learning with an answer key:

flowchart LR
  A["Training Data
(inputs + labels)"] --> B["Learning Algorithm"]
  B --> C["Trained Model"]
  C --> D["New Input"]
  D --> E["Prediction"]
  style B fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style C fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38

The model sees thousands of examples, compares its predictions to the correct answers, and adjusts itself to minimize errors. Once trained, it can make predictions on data it’s never seen before.

A Real-World Dataset

Let’s work with a concrete example: predicting house prices. Here’s a sample of our training data:

Size (sq ft)BedroomsAge (years)GarageNeighborhoodPrice ($)
1,400315YesSuburban285,000
850130NoUrban165,000
2,20045YesSuburban425,000
1,100220NoUrban210,000
3,00052YesRural380,000
1,600310YesUrban340,000
900225NoSuburban195,000
2,50048YesSuburban465,000

In this dataset:

  • Features (inputs): Size, Bedrooms, Age, Garage, Neighborhood
  • Label (target): Price

The model’s job is to learn the relationship between features and the label so it can predict the price of a house it’s never seen.

The Two Flavors of Supervised Learning

Every supervised learning problem falls into one of two categories:

flowchart TD
  SL["Supervised Learning"] --> REG["Regression"]
  SL --> CLS["Classification"]
  REG --> R1["Predict a continuous number"]
  REG --> R2["Examples: price, temperature, age"]
  CLS --> C1["Predict a category/class"]
  CLS --> C2["Examples: spam/not spam, cat/dog"]
  style SL fill:#f3e8fd,stroke:#5b1a7a,color:#5b1a7a
  style REG fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style CLS fill:#e8f4fd,stroke:#1a5276,color:#1a5276
AspectRegressionClassification
Output typeContinuous numberDiscrete category
Example output$285,000”Spam” or “Not Spam”
Loss functionMean Squared ErrorCross-entropy
EvaluationMAE, RMSE, R²Accuracy, Precision, Recall, F1
AlgorithmsLinear Regression, Random ForestLogistic Regression, SVM, Decision Tree

We’ll cover each in detail in the next two posts. For now, let’s understand the full supervised learning pipeline.

The Training Process

Step 1: Split the Data

You never train and evaluate on the same data. A common split:

pie title Data Split
  "Training Set (70%)" : 70
  "Validation Set (15%)" : 15
  "Test Set (15%)" : 15
  • Training set: The model learns from this
  • Validation set: Used during training to tune hyperparameters
  • Test set: Final evaluation - the model never sees this until the end

Step 2: The Model Learns

During training, the model:

  1. Takes a batch of training examples
  2. Makes predictions
  3. Computes the loss (how far off the predictions are)
  4. Adjusts its internal parameters to reduce the loss
  5. Repeats for many iterations (epochs)
flowchart TD
  A["Input Batch"] --> B["Model Prediction"]
  B --> C["Compare with True Labels"]
  C --> D["Compute Loss"]
  D --> E["Update Model Parameters"]
  E --> A
  style D fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a
  style E fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38

Step 3: Evaluate

After training, we check how well the model generalizes to unseen data using the test set.

Feature Types

Real-world data comes in different types, and the model needs to handle each:

Feature TypeDescriptionExampleHow to Handle
NumericalContinuous or discrete numbersSize: 1,400 sq ftUse directly (may need scaling)
CategoricalDiscrete labelsNeighborhood: “Urban”One-hot encode or label encode
BinaryYes/No, True/FalseGarage: YesConvert to 0/1
OrdinalCategories with orderLevel: Low, Med, HighLabel encode with order

One-Hot Encoding Example

The “Neighborhood” column can’t be fed as text to a model. We convert it:

Neighborhoodis_Urbanis_Suburbanis_Rural
Urban100
Suburban010
Rural001

Overfitting vs Underfitting

The most important concept in supervised learning: your model must generalize well.

flowchart LR
  UF["Underfitting
(too simple)"] --- GF["Good Fit
(just right)"] --- OF["Overfitting
(too complex)"]
  style UF fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a
  style GF fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style OF fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a
StateTraining AccuracyTest AccuracyProblem
UnderfittingLowLowModel too simple, misses patterns
Good fitHighHighModel captures real patterns
OverfittingVery HighLowModel memorized training data

How to Detect Overfitting

If your training error is very low but validation error is high, you’re overfitting.

How to Fix It

  • More data - more examples reduce overfitting
  • Simpler model - fewer parameters, less complexity
  • Regularization - add penalty for overly complex models
  • Cross-validation - evaluate on multiple data splits
  • Early stopping - stop training when validation error starts increasing

Real-Life Supervised Learning Applications

ApplicationInput FeaturesLabelType
House price predictionSize, location, agePrice ($)Regression
Email spam detectionWords, sender, linksSpam / Not SpamClassification
Medical diagnosisSymptoms, lab resultsDisease / HealthyClassification
Stock price forecastingHistorical prices, volumeFuture priceRegression
Customer churnUsage, tenure, complaintsWill leave? Yes/NoClassification
Weather forecastingTemperature, humidity, windTomorrow’s tempRegression
Loan default predictionIncome, credit score, debtDefault? Yes/NoClassification
Image recognitionPixel valuesObject labelClassification

The Bias-Variance Tradeoff

Every supervised model faces a fundamental tension:

  • Bias: Error from overly simple assumptions. High bias → underfitting.
  • Variance: Error from sensitivity to training data. High variance → overfitting.
Low VarianceHigh Variance
Low Bias✅ Ideal (good generalization)⚠️ Overfitting
High Bias⚠️ Underfitting❌ Worst case (both problems)

The goal is to find the sweet spot - a model complex enough to capture real patterns but not so complex that it memorizes noise.

Evaluation Metrics

Different tasks need different metrics:

For Regression

  • MAE (Mean Absolute Error): Average magnitude of errors
  • RMSE (Root Mean Squared Error): Penalizes large errors more
  • (R-squared): How much variance the model explains (0 to 1)

For Classification

  • Accuracy: % of correct predictions (can be misleading with imbalanced data)
  • Precision: Of predicted positives, how many are actually positive?
  • Recall: Of actual positives, how many did we catch?
  • F1 Score: Harmonic mean of precision and recall

What’s Next?

Now that you understand the supervised learning framework, we’ll dive deep into the two subtypes:

flowchart LR
  A["✅ Intro to ML"] --> B["✅ Supervised Learning"]
  B --> C["Regression"]
  C --> D["Classification"]
  D --> E["Unsupervised Learning"]
  style A fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style B fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
  style C fill:#e8f4fd,stroke:#1a5276,color:#1a5276

In the next post, we’ll explore Regression - predicting continuous values. We’ll build a linear regression model step by step, derive the math, and visualize how the model fits data.

See you in Part 3.

Start typing to search across all content
navigate Enter open Esc close