Clustering

𝐏𝐨𝐰𝐞𝐫 𝐨𝐟 𝐇𝐢𝐞𝐫𝐚𝐫𝐜𝐡𝐢𝐜��𝐥 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐢𝐧𝐠: 𝐀 𝐆𝐮𝐢𝐝𝐞 𝐭𝐨 𝐃𝐚𝐭𝐚 𝐆𝐫𝐨𝐮𝐩𝐢𝐧𝐠

In the world of Data Science, Hierarchical Clustering stands out for its elegance and versatility. This powerful method helps group similar data points, uncover hidden patterns, and explore relationships within datasets. 🌐

🔑 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐇𝐢𝐞𝐫𝐚𝐫𝐜𝐡𝐢𝐜𝐚𝐥 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐢𝐧𝐠?

Hierarchical Clustering is an unsupervised learning technique that builds a tree of clusters, called a dendrogram, by progressively merging smaller clusters into larger ones. Here's how it works:

𝐒𝐭𝐚𝐫𝐭 𝐰𝐢𝐭𝐡 𝐈𝐧𝐝𝐢𝐯𝐢𝐝𝐮𝐚𝐥 𝐃𝐚𝐭𝐚 𝐏𝐨𝐢𝐧𝐭𝐬: Initially, each data point is treated as its own cluster.

𝐌𝐞𝐚𝐬𝐮𝐫𝐞 𝐃𝐢𝐬𝐭𝐚𝐧𝐜𝐞𝐬: The distance between clusters is calculated using a defined metric.

𝐌𝐞𝐫𝐠𝐞 𝐭𝐡𝐞 𝐂𝐥𝐨𝐬𝐞𝐬𝐭 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐬: The closest clusters are merged, repeating until all points belong to a single cluster.

𝐕𝐢𝐬𝐮𝐚𝐥𝐢𝐳𝐞 𝐰𝐢𝐭𝐡 𝐚 𝐃𝐞𝐧𝐝𝐫𝐨𝐠𝐫𝐚𝐦: The dendrogram visually shows how clusters merge and at what distance.

🔍 𝐋𝐢𝐧𝐤𝐚𝐠𝐞 𝐌𝐞𝐭𝐡𝐨𝐝𝐬: 𝐂𝐡𝐨𝐨𝐬𝐢𝐧𝐠 𝐘𝐨𝐮𝐫 𝐃𝐢𝐬𝐭𝐚𝐧𝐜𝐞 𝐌𝐞𝐭𝐫𝐢𝐜

The effectiveness of hierarchical clustering depends on how we measure distances between clusters. Here are the most common methods:

𝐀𝐯𝐞𝐫𝐚𝐠𝐞 𝐋𝐢𝐧𝐤𝐚𝐠𝐞

What it does: Calculates the average distance between all points in two clusters.

Formula:

𝐃_𝐚𝐯𝐠(𝐀, 𝐁) = (1 / |𝐀| * |𝐁|) * Σ (𝐢 ∈ 𝐀) Σ (𝐣 ∈ 𝐁) 𝐝(𝐢, 𝐣)

𝐒𝐢𝐧𝐠𝐥𝐞 𝐋𝐢𝐧𝐤𝐚𝐠𝐞

What it does: Measures the shortest distance between any two points, one from each cluster.

Formula:

𝐃_𝐬𝐢𝐧𝐠𝐥𝐞(𝐀, 𝐁) = 𝐦𝐢𝐧(𝐢 ∈ 𝐀, 𝐣 ∈ 𝐁) 𝐝(𝐢, 𝐣)

𝐂𝐨𝐦𝐩𝐥𝐞𝐭𝐞 𝐋𝐢𝐧𝐤𝐚𝐠𝐞

What it does: Focuses on the farthest distance between two points.

Formula:

𝐃_𝐜𝐨𝐦𝐩𝐥𝐞𝐭𝐞(𝐀, 𝐁) = 𝐦𝐚𝐱(𝐢 ∈ 𝐀, 𝐣 ∈ 𝐁) 𝐝(𝐢, 𝐣)

𝐖𝐚𝐫𝐝’𝐬 𝐋𝐢𝐧𝐤𝐚𝐠𝐞

What it does: Minimizes the variance within clusters.

Formula:

𝐃_𝐖𝐚𝐫𝐝(𝐀, 𝐁) = (|𝐀| * |𝐁|) / (|𝐀| + |𝐁|) * 𝐝(𝐀, 𝐁)

🌳 𝐓𝐡𝐞 𝐃𝐞𝐧𝐝𝐫𝐨𝐠𝐫𝐚𝐦: 𝐀 𝐕𝐢𝐬𝐮𝐚𝐥 𝐆𝐮𝐢𝐝𝐞 𝐭𝐨 𝐘𝐨𝐮𝐫 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐬

The dendrogram is one of the most powerful aspects of hierarchical clustering. This tree-like diagram illustrates how clusters merge and provides a clear view of cluster similarities at different levels. The height of the branches shows the distance at which clusters were merged, helping to choose the optimal number of clusters.

🚀 𝐖𝐡𝐲 𝐔𝐬𝐞 𝐇𝐢𝐞𝐫𝐚𝐫𝐜𝐡𝐢𝐜𝐚𝐥 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐢𝐧𝐠?

Hierarchical clustering is ideal for datasets where the structure isn’t immediately obvious. It’s perfect for:

𝐂𝐮𝐬𝐭𝐨𝐦𝐞𝐫 𝐒𝐞𝐠𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧: Grouping customers based on behaviours.

𝐃𝐨𝐜𝐮𝐦𝐞𝐧𝐭 𝐂𝐥𝐮𝐬𝐭𝐞𝐫𝐢𝐧𝐠: Organizing documents into topics.

𝐁𝐢𝐨𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐜𝐬: Classifying genes or proteins with similar functions.

𝐌𝐚𝐫𝐤𝐞𝐭 𝐑𝐞𝐬𝐞𝐚𝐫𝐜𝐡: Identifying patterns in consumer behaviour.

TECH AND SCIENCE CONTENT BELOG.

Search This Blog

Clustering

Comments

Post a Comment

Popular posts from this blog

Top excel formula,master it

Ways of pandas making faster

Free skill course