In Unsupervised Learning, the training data is NOT labelled or named. The un-labeled data are used in training the Machine Learning algorithms and at the end of the training, the algorithm groups or categorises the un-labeled data according to similarities, patterns, and differences.
This type of Machine Learning can help in grouping and organising data in such a way that you can come in and make sense of the grouped data.
A practical example is training a Machine Learning algorithm with different pictures of various fruits. The algorithm finds similarities and patterns among these pictures and is able to group the fruits based on those similarities and patterns.
Key Components
1. input Data
Unsupervised learning algorithms operate on raw input data, which can be in the form of features, images, text, or any other format relevant to the task.
2. Objective
Unsupervised learning typically aims to discover the inherent structure or patterns in the data. Unlike supervised learning, there are no predefined labels or specific outputs that the algorithm seeks to predict.
3. Algorithms
Unsupervised learning algorithms are responsible for finding patterns or representations in the data. Common types of unsupervised learning algorithms include clustering algorithms (e.g., K-Means, Hierarchical Clustering), dimensionality reduction techniques (e.g., PCA, t-SNE), and density estimation methods (e.g., Gaussian Mixture Models).
4. Clustering
Clustering is a central task in unsupervised learning. It involves grouping similar data points together into clusters based on certain criteria. Clustering algorithms aim to identify natural groupings within the data.
5. Dimensionality Reduction
Dimensionality reduction techniques are used to reduce the number of features or dimensions in the data while preserving essential information. This helps in visualising high-dimensional data and capturing its intrinsic structure.
6. Density Estimation
Density estimation methods model the underlying probability distribution of the data. Gaussian Mixture Models (GMM) are an example of a density estimation technique often used in unsupervised learning.
7. Feature Learning
Feature learning involves automatically learning useful representations or features from the raw data. Auto-encoders and deep learning architectures are commonly used for feature learning in unsupervised settings.
8. Anomaly Detection
Unsupervised learning can be applied to identify anomalies or outliers in the data. Algorithms like Isolation Forests and One-Class SVM are commonly used for anomaly detection.
9. Representation Learning
Representation learning focuses on learning efficient and meaningful representations of the input data. This is particularly important in tasks where the underlying structure of the data needs to be captured.
10. Evaluation Metrics
While unsupervised learning doesn't have traditional accuracy metrics (as there are no labeled outputs), it often relies on evaluation measures specific to the task. For clustering, metrics like silhouette score or Davies-Bouldin index may be used.
11. Visualisation Techniques
Visualisation is crucial in unsupervised learning for understanding and interpreting the discovered patterns. Techniques like t-Distributed Stochastic Neighbour Embedding (t-SNE) are commonly used for visualising high-dimensional data in lower dimensions.
12. Preprocessing
Data preprocessing steps, such as normalisation, scaling, and handling missing values, are still important in unsupervised learning to ensure the effectiveness of algorithms and the quality of discovered patterns.
Understanding these key components is essential when applying unsupervised learning techniques to real-world problems. The choice of algorithm depends on the characteristics of the data and the specific goals of the analysis.
Commonly used algorithms
1. K-Means Clustering
Type: Clustering
Use: Grouping data points into K clusters based on similarities in feature space.
Example: Customer segmentation for targeted marketing.
2. Hierarchical Clustering
Type: Clustering
Use: Building a tree-like hierarchy of clusters.
Example: Taxonomy creation based on genetic similarities in species.
3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
Type: Clustering
Use: Identifying clusters based on data point density.
Example: Identifying hotspots of criminal activity in a city.
4. Principal Component Analysis (PCA)
Type: Dimensionality Reduction
Use: Transforming high-dimensional data into a lower-dimensional space.
Example: Reducing facial features dimensions for facial recognition.
5. t-Distributed Stochastic Neighbour Embedding (t-SNE)
Type: Dimensionality Reduction
Use: Visualising high-dimensional data in two or three dimensions.
Example: Visualising relationships between different types of documents.
6. Auto-encoders
Type: Dimensionality Reduction, Feature Learning
Use: Learning a compressed representation of input data.
Example: Anomaly detection in credit card transactions.
7. Gaussian Mixture Models (GMM)
Type: Clustering, Density Estimation
Use: Modelling data as a mixture of Gaussian distributions.
Example: Identifying different species based on biometric measurements.
8. Apriori Algorithm
Type: Association Rule Learning
Use: Discovering frequent item-sets in transactional databases.
Example: Market basket analysis to identify co-purchased items.
9. Mean-Shift Clustering
Type: Clustering
Use: Identifying dense regions in the feature space.
Example: Image segmentation based on colour similarity.
10. K-Nearest Neighbours (KNN)
Type: Clustering, Density Estimation
Use: Grouping data points based on majority class among their k-nearest neighbours.
Example: Recommender systems based on similar user behavior.
11. Isolation Forest
Type: Anomaly Detection
Use: Detecting anomalies using an ensemble of decision trees.
Example: Identifying defective products in manufacturing.
12. Word Embeddings (Word2Vec, GloVe)
Type: Feature Learning
Use: Learning distributed representations of words based on their context.
Example: Finding semantically similar words in natural language processing.
These examples showcase the versatility of unsupervised learning algorithms in addressing various tasks across different domains. The choice of algorithm depends on the specific goals and characteristics of the data at hand.
Top comments (0)