What is machine learning: a map of the field
In this series (18 parts)
- What is machine learning: a map of the field
- Data, features, and the ML pipeline
- Linear regression
- Bias, variance, and the tradeoff
- Regularization: Ridge, Lasso, and ElasticNet
- Logistic regression and classification
- Evaluation metrics for classification
- Naive Bayes classifier
- K-Nearest Neighbors
- Decision trees
- Ensemble methods: Bagging and Random Forests
- Boosting: AdaBoost and Gradient Boosting
- Support Vector Machines
- K-Means clustering
- Dimensionality Reduction: PCA
- Gaussian mixture models and EM algorithm
- Model selection and cross-validation
- Feature engineering and selection
Machine learning is programming where you don’t write the rules. You give the computer data, tell it what the right answers look like, and it figures out the rules on its own. That single idea powers everything from spam filters to self-driving cars.
Traditional programming vs machine learning
In traditional programming, you write explicit rules:
def classify_email(email):
if "viagra" in email.lower():
return "spam"
if "winner" in email.lower() and "click here" in email.lower():
return "spam"
return "not spam"
You sit down, think about patterns, and encode them by hand. This works fine when you have a small, well-understood set of rules. But what happens when spammers change their wording? You update rules forever.
Machine learning flips this around. Instead of writing rules, you provide examples:
| Email text | Label |
|---|---|
| ”Buy cheap medicine now” | spam |
| ”Meeting at 3pm tomorrow” | not spam |
| ”You won a free iPhone” | spam |
| ”Quarterly report attached” | not spam |
The ML algorithm looks at these examples and learns a function that maps email text to labels. When a new email arrives, produces a prediction without you writing a single if statement.
Example 1: learning a simple rule from data
Suppose you have four houses with their sizes and prices:
| Size (sq ft) | Price ($k) |
|---|---|
| 800 | 160 |
| 1000 | 200 |
| 1200 | 240 |
| 1500 | 300 |
A traditional programmer might eyeball this and write price = 0.2 * size. An ML algorithm does the same thing, but automatically. It starts with a guess, say , and adjusts until predictions match the data.
Let’s check: if :
The algorithm found by minimizing prediction errors across all examples. You never told it the rule. It discovered the rule from data.
Example 2: why rules break down
Now imagine classifying handwritten digits (0 through 9). Try writing if statements for that. What makes a “7” a “7”? A horizontal stroke at the top and a diagonal stroke going down-left? Some people cross their sevens. Some write them with serifs. The number of rules explodes.
ML handles this naturally. You show the algorithm 60,000 labeled images of digits. Each image is a grid of pixel values, say numbers. The algorithm learns a function:
This function maps 784 pixel intensities to one of 10 digit classes. No hand-written rules needed.
Where ML sits in the bigger picture
People use “AI,” “machine learning,” and “deep learning” interchangeably, but they are nested concepts:
- Artificial Intelligence is the broadest term. Any system that performs tasks normally requiring human intelligence. This includes rule-based expert systems, search algorithms, and ML.
- Machine Learning is a subset of AI. Systems that learn from data instead of being explicitly programmed.
- Deep Learning is a subset of ML. It uses neural networks with many layers to learn complex patterns.
graph TD AI["Artificial Intelligence"] --> ML["Machine Learning"] ML --> DL["Deep Learning"] AI --> RuleBased["Rule-Based Systems"] AI --> Search["Search & Planning"] ML --> Classical["Classical ML (trees, SVMs, etc.)"]
ML also overlaps heavily with statistics. The difference is mostly cultural. Statisticians tend to focus on inference (understanding relationships in data), while ML practitioners focus on prediction (making accurate guesses on new data). The math is often identical.
The three branches of machine learning
Supervised learning
You give the algorithm input-output pairs. It learns the mapping. This is the most common type of ML in practice.
Two main flavors:
- Regression: output is a continuous number. Predicting house prices, stock returns, temperature.
- Classification: output is a category. Spam or not spam, cat or dog, digit 0 through 9.
The training data looks like where is the input (features) and is the label (target).
The algorithm finds a function that minimizes some measure of error between predictions and true labels .
Common supervised algorithms include linear regression (for continuous targets), logistic regression (for categories), decision trees, support vector machines, and neural networks. We’ll cover each of these in depth throughout this series.
A simple 2D classification task. Points above the dashed line (high reputation, few suspicious words) are “not spam.” Points below (low reputation, many suspicious words) are “spam.” The classifier learns this boundary from labeled examples.
Unsupervised learning
You give the algorithm inputs only, no labels. It finds structure on its own.
Common tasks:
- Clustering: group similar data points together. Customer segmentation, document grouping.
- Dimensionality reduction: compress high-dimensional data into fewer dimensions while preserving structure. PCA is the classic example.
- Density estimation: learn the underlying probability distribution of the data.
No one tells the algorithm “these customers belong together.” It discovers groupings based on patterns in the features.
Reinforcement learning
An agent takes actions in an environment, receives rewards or penalties, and learns a strategy (policy) to maximize total reward over time.
Think of it like training a dog. The dog (agent) tries actions (sit, roll over). You give treats (positive reward) or say “no” (negative reward). Over time the dog learns which actions lead to treats.
Formally, at each time step :
- The agent observes state
- Takes action
- Receives reward
- Transitions to new state
The goal is to learn a policy that maximizes the expected cumulative reward:
where is a discount factor that makes the agent value immediate rewards more than distant ones.
Reinforcement learning powers game-playing agents (AlphaGo, Atari), robotics, and recommendation systems.
Key terminology
Before going further, let’s nail down terms you’ll see everywhere:
- Features (): the input variables. House size, number of bedrooms, zip code.
- Labels/targets (): the thing you’re predicting. House price, spam/not-spam.
- Training set: the data you learn from.
- Test set: data held out to evaluate how well your model generalizes.
- Model: the function with its learned parameters.
- Parameters: the numbers the algorithm adjusts during training (weights , bias ).
- Hyperparameters: settings you choose before training (learning rate, number of layers, regularization strength).
- Loss function: measures how bad your predictions are. Lower is better. You’ll see MSE for regression and cross-entropy for classification.
How a model actually learns
Let’s make the learning process concrete. Consider a linear model for a single feature. The model has one parameter, . Training means finding the that minimizes the loss.
Start with a random , compute the loss, then ask: “If I increase slightly, does the loss go up or down?” That’s the gradient. If the gradient is positive, decreasing reduces the loss. If negative, increasing helps.
The update rule for gradient descent is:
where is the learning rate, a small positive number that controls step size. Too large and you overshoot. Too small and training takes forever.
This loop repeats: predict, compute loss, compute gradient, update. Each pass through the full training set is called an epoch. Most models need hundreds or thousands of epochs to converge.
The ML workflow at a high level
Every ML project follows roughly the same steps:
- Define the problem. What are you predicting? What data do you have?
- Collect and prepare data. Clean it, handle missing values, split into train/test.
- Choose a model. Start simple (linear regression, logistic regression) and increase complexity as needed.
- Train. Feed data to the algorithm, let it adjust parameters.
- Evaluate. Check performance on held-out test data.
- Iterate. Try different features, models, hyperparameters.
flowchart LR A[Define Problem] --> B[Prepare Data] B --> C[Choose Model] C --> D[Train] D --> E[Evaluate] E -->|Not good enough| B E -->|Good enough| F[Deploy]
What this series covers
This series builds ML from the ground up. Each article includes the math, worked examples, and code. Here’s the roadmap:
- What is ML (you are here)
- Data, features, and the ML pipeline - how to prepare data properly
- Linear regression - your first ML algorithm, solved two ways
- Bias and variance - why models fail and how to diagnose it
- Regularization - preventing overfitting with Ridge, Lasso, and ElasticNet
- Logistic regression - moving from regression to classification
Later articles will cover decision trees, SVMs, neural networks, and more. The math prerequisites are covered in companion series on calculus, linear algebra, and optimization.
What comes next
The next article, Data, features, and the ML pipeline, covers how to prepare your data for ML: splitting datasets, scaling features, and avoiding data leakage. Getting data right is half the battle.