Supervised Learning - Learning from Labeled Data
In this series (5 parts)
- Introduction to Machine Learning
- Supervised Learning - Learning from Labeled Data
- Regression - Predicting Continuous Values
- Classification - Predicting Categories
- Unsupervised Learning - Finding Hidden Patterns
In the previous post, we defined machine learning as teaching computers to learn from data. Now let’s look at the most common and practical approach: supervised learning.
What is Supervised Learning?
Supervised learning is when you train a model using labeled data - meaning each training example comes with both the input and the correct answer.
Think of it like a student learning with an answer key:
flowchart LR A["Training Data (inputs + labels)"] --> B["Learning Algorithm"] B --> C["Trained Model"] C --> D["New Input"] D --> E["Prediction"] style B fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38 style C fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
The model sees thousands of examples, compares its predictions to the correct answers, and adjusts itself to minimize errors. Once trained, it can make predictions on data it’s never seen before.
A Real-World Dataset
Let’s work with a concrete example: predicting house prices. Here’s a sample of our training data:
| Size (sq ft) | Bedrooms | Age (years) | Garage | Neighborhood | Price ($) |
|---|---|---|---|---|---|
| 1,400 | 3 | 15 | Yes | Suburban | 285,000 |
| 850 | 1 | 30 | No | Urban | 165,000 |
| 2,200 | 4 | 5 | Yes | Suburban | 425,000 |
| 1,100 | 2 | 20 | No | Urban | 210,000 |
| 3,000 | 5 | 2 | Yes | Rural | 380,000 |
| 1,600 | 3 | 10 | Yes | Urban | 340,000 |
| 900 | 2 | 25 | No | Suburban | 195,000 |
| 2,500 | 4 | 8 | Yes | Suburban | 465,000 |
In this dataset:
- Features (inputs): Size, Bedrooms, Age, Garage, Neighborhood
- Label (target): Price
The model’s job is to learn the relationship between features and the label so it can predict the price of a house it’s never seen.
The Two Flavors of Supervised Learning
Every supervised learning problem falls into one of two categories:
flowchart TD SL["Supervised Learning"] --> REG["Regression"] SL --> CLS["Classification"] REG --> R1["Predict a continuous number"] REG --> R2["Examples: price, temperature, age"] CLS --> C1["Predict a category/class"] CLS --> C2["Examples: spam/not spam, cat/dog"] style SL fill:#f3e8fd,stroke:#5b1a7a,color:#5b1a7a style REG fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38 style CLS fill:#e8f4fd,stroke:#1a5276,color:#1a5276
| Aspect | Regression | Classification |
|---|---|---|
| Output type | Continuous number | Discrete category |
| Example output | $285,000 | ”Spam” or “Not Spam” |
| Loss function | Mean Squared Error | Cross-entropy |
| Evaluation | MAE, RMSE, R² | Accuracy, Precision, Recall, F1 |
| Algorithms | Linear Regression, Random Forest | Logistic Regression, SVM, Decision Tree |
We’ll cover each in detail in the next two posts. For now, let’s understand the full supervised learning pipeline.
The Training Process
Step 1: Split the Data
You never train and evaluate on the same data. A common split:
pie title Data Split "Training Set (70%)" : 70 "Validation Set (15%)" : 15 "Test Set (15%)" : 15
- Training set: The model learns from this
- Validation set: Used during training to tune hyperparameters
- Test set: Final evaluation - the model never sees this until the end
Step 2: The Model Learns
During training, the model:
- Takes a batch of training examples
- Makes predictions
- Computes the loss (how far off the predictions are)
- Adjusts its internal parameters to reduce the loss
- Repeats for many iterations (epochs)
flowchart TD A["Input Batch"] --> B["Model Prediction"] B --> C["Compare with True Labels"] C --> D["Compute Loss"] D --> E["Update Model Parameters"] E --> A style D fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a style E fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38
Step 3: Evaluate
After training, we check how well the model generalizes to unseen data using the test set.
Feature Types
Real-world data comes in different types, and the model needs to handle each:
| Feature Type | Description | Example | How to Handle |
|---|---|---|---|
| Numerical | Continuous or discrete numbers | Size: 1,400 sq ft | Use directly (may need scaling) |
| Categorical | Discrete labels | Neighborhood: “Urban” | One-hot encode or label encode |
| Binary | Yes/No, True/False | Garage: Yes | Convert to 0/1 |
| Ordinal | Categories with order | Level: Low, Med, High | Label encode with order |
One-Hot Encoding Example
The “Neighborhood” column can’t be fed as text to a model. We convert it:
| Neighborhood | is_Urban | is_Suburban | is_Rural |
|---|---|---|---|
| Urban | 1 | 0 | 0 |
| Suburban | 0 | 1 | 0 |
| Rural | 0 | 0 | 1 |
Overfitting vs Underfitting
The most important concept in supervised learning: your model must generalize well.
flowchart LR UF["Underfitting (too simple)"] --- GF["Good Fit (just right)"] --- OF["Overfitting (too complex)"] style UF fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a style GF fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38 style OF fill:#fde8e8,stroke:#7a1a1a,color:#7a1a1a
| State | Training Accuracy | Test Accuracy | Problem |
|---|---|---|---|
| Underfitting | Low | Low | Model too simple, misses patterns |
| Good fit | High | High | Model captures real patterns |
| Overfitting | Very High | Low | Model memorized training data |
How to Detect Overfitting
If your training error is very low but validation error is high, you’re overfitting.
How to Fix It
- More data - more examples reduce overfitting
- Simpler model - fewer parameters, less complexity
- Regularization - add penalty for overly complex models
- Cross-validation - evaluate on multiple data splits
- Early stopping - stop training when validation error starts increasing
Real-Life Supervised Learning Applications
| Application | Input Features | Label | Type |
|---|---|---|---|
| House price prediction | Size, location, age | Price ($) | Regression |
| Email spam detection | Words, sender, links | Spam / Not Spam | Classification |
| Medical diagnosis | Symptoms, lab results | Disease / Healthy | Classification |
| Stock price forecasting | Historical prices, volume | Future price | Regression |
| Customer churn | Usage, tenure, complaints | Will leave? Yes/No | Classification |
| Weather forecasting | Temperature, humidity, wind | Tomorrow’s temp | Regression |
| Loan default prediction | Income, credit score, debt | Default? Yes/No | Classification |
| Image recognition | Pixel values | Object label | Classification |
The Bias-Variance Tradeoff
Every supervised model faces a fundamental tension:
- Bias: Error from overly simple assumptions. High bias → underfitting.
- Variance: Error from sensitivity to training data. High variance → overfitting.
| Low Variance | High Variance | |
|---|---|---|
| Low Bias | ✅ Ideal (good generalization) | ⚠️ Overfitting |
| High Bias | ⚠️ Underfitting | ❌ Worst case (both problems) |
The goal is to find the sweet spot - a model complex enough to capture real patterns but not so complex that it memorizes noise.
Evaluation Metrics
Different tasks need different metrics:
For Regression
- MAE (Mean Absolute Error): Average magnitude of errors
- RMSE (Root Mean Squared Error): Penalizes large errors more
- R² (R-squared): How much variance the model explains (0 to 1)
For Classification
- Accuracy: % of correct predictions (can be misleading with imbalanced data)
- Precision: Of predicted positives, how many are actually positive?
- Recall: Of actual positives, how many did we catch?
- F1 Score: Harmonic mean of precision and recall
What’s Next?
Now that you understand the supervised learning framework, we’ll dive deep into the two subtypes:
flowchart LR A["✅ Intro to ML"] --> B["✅ Supervised Learning"] B --> C["Regression"] C --> D["Classification"] D --> E["Unsupervised Learning"] style A fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38 style B fill:#e8f8f0,stroke:#1a5c38,color:#1a5c38 style C fill:#e8f4fd,stroke:#1a5276,color:#1a5276
In the next post, we’ll explore Regression - predicting continuous values. We’ll build a linear regression model step by step, derive the math, and visualize how the model fits data.
See you in Part 3.