PCA: A Simple Way to See Data Differently
A Simple Unsupervised Learning Method to Reduce the Dimensionality of Your Data
Why Fewer Dimensions Can Tell the Full Story!
One of the things I love about data science is that a lot of the “scary sounding” methods are actually simple ideas once you set aside the maths.
Let’s take the very important Principal Component Analysis (A.K.A ‘PCA’).
It may sound technical, but at its heart, PCA is just a way to look at data from a new angle.
PCA: The problem it solves
Imagine you’ve got two variables that basically tell you the same story.
Example:
x1 = the weight of something
x2 = almost the same as x1, just with a little noise added
If you plot them, you’ll see both are highly correlated. So do you really need both? Not really.
This is where PCA steps in. It says:
“What if we rotate the coordinate system so we capture the main variation with fewer dimensions?”
In other words, PCA reduces your data down to its core signal, while discarding the extra noise.
How it works (intuition only)
PCA takes your variables and finds new axes (principal components).
These new axes are built so that the first one explains the most variance in the data.
The second one explains what’s left (while being orthogonal to the first).
In a 2D dataset, that means instead of dealing with x1 and x2, you might only need PC1 to capture almost everything important.
But why it matters…?
PCA is everywhere in machine learning and AI workflows. It’s an important concept to be aware of.
Dimensionality reduction: when you’ve got 100 variables but only 5 really matter
Visualisation: when you want to plot messy, high-dimensional data in 2D or 3D
Noise filtering: when you want the clean signal, not the random clutter
Think of PCA as a compression tool for your data. It keeps what matters and throws away the rest 😉
My takeaway
You don’t need to master the maths (and certainly not now). What’s valuable is to understand the intuition: PCA is about finding better ways to represent data so we can see patterns more clearly.
This week, do it yourself 💪
Take a dataset with several variables (Iris, Wine, or even your own).
Run PCA and plot the first two components.
See how much structure you can capture in just two dimensions.
That simple experience alone will stick in your head, and you will never forget what PCA really does.