Data Loading and Preprocessing:
Load the dataset Cereals.csv using pandas.
Remove cereals with missing values using dropna().
import pandas as pd
# Load the dataset
data = pd.read_csv('Cereals.csv')
# Remove cereals with missing values
data.dropna(inplace=True)
Normalization of Data:
Normalize the dataset, ensuring all features are on the same scale (e.g., using MinMaxScaler or StandardScaler from scikit-learn).
from sklearn.preprocessing import MinMaxScaler
# Select relevant columns for clustering and normalize
columns_for_clustering = ['feature_1', 'feature_2', ...] # Replace with actual column names
scaler = MinMaxScaler()
data_normalized = scaler.fit_transform(data[columns_for_clustering])
Hierarchical Clustering:
Use hierarchical clustering from scikit-learn (AgglomerativeClustering) with both single linkage and complete linkage.
Plot dendrograms for both using scipy or matplotlib to visualize the clustering structures.
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt
# Perform hierarchical clustering with single linkage
single_linkage = linkage(data_normalized, method='single')
# Perform hierarchical clustering with complete linkage
complete_linkage = linkage(data_normalized, method='complete')
# Plot dendrogram for single linkage
plt.figure(figsize=(10, 5))
plt.title('Dendrogram - Single Linkage')
dendrogram(single_linkage)
plt.show()
# Plot dendrogram for complete linkage
plt.figure(figsize=(10, 5))
plt.title('Dendrogram - Complete Linkage')
dendrogram(complete_linkage)
plt.show()
Interpretation:
Compare dendrograms and observe the structures of clusters formed.
Calculate cluster centroids using the aggregate() function to obtain average values of each cluster's members.
Comment on the structures, stability, and meaningfulness of clusters formed by both methods.
Selecting Number of Clusters:
Decide on the number of clusters based on the dendrogram and where the merging stops.
Determine the cutoff distance (height on the dendrogram) to obtain the desired number of clusters.