Regression and Classification: The Two Building Blocks of Data Science
Understanding the two fundamental ways we model data, predicting numbers and predicting categories, and why they still matter in today’s AI era
When we talk about data science, it often feels like there’s a never-ending list of algorithms and buzzwords to keep up with. But at the heart of it, two ideas show up again and again: regression and classification.
One of the most important things we can do with data is to model relationships.
Two of the most common ways are regression and classification.
Regression: predicting numbers
Regression is about predicting a numeric outcome.
Formally, as follows:
Y = f(X) + \epsilon
Y is the target (what we want to predict).
X are the predictors.
ε is the part we can’t explain.
A classic example is the diamond prediction prices from carat weight.
Fit a line to the data, and you can get a reasonable estimate of price given carat size.
But regression isn’t only about prediction. It helps us understand relationships. For instance, seeing how a drug dosage affects cholesterol levels is far more useful than just predicting a number.
Classification: predicting categories
Classification is similar… but the target outcome is categorical rather than numeric.
Let’s take the well-known Fisher’s iris dataset. With a few flower measurements, we can train a model to predict whether a new flower belongs to setosa, versicolor, or virginica.
The model doesn’t output a continuous number but instead decides which class the data most likely belongs to.
Why this still matters
With all the attention on large language models and cutting-edge AI, it’s tempting to dismiss regression and classification as old news. But almost everything we build today rests on these foundations.
Forecasting, pricing, and scoring rely on regression.
Fraud detection, spam filters, and medical diagnosis rely on classification.
If you can understand these fundamentals (both the math and the intuition), you’ll not only use today’s AI tools more effectively, but you’ll also be able to adapt when the next wave of tools comes along.
A simple next step
Pick a small dataset this week and run a regression or classification yourself. It can be as little as 30 minutes, but you need to start this week, otherwise you will never start. You definitely don’t want to chase a perfect model, just keep learning.
Because at the end of the day, it’s the engineers who combine a clear grasp of fundamentals with modern tooling who will always be in demand! So, what are you waiting for? Invest in this new skill today, as little as 30 minutes, and review what you have learned at the end of the week 🚀
More to come on my LinkedIn! Make sure to hit that follow button so you don’t miss out! 🔥




