56.4k views
0 votes
Explain the k-means clustering algorithm. Provide pseudo code of the algorithm. Implement the k-means clustering algorithm following your explanation and the pseudo code. In the implementation, select initial cluster representatives randomly. (Code should be in python language).

1 Answer

6 votes

Final answer:

The k-means clustering algorithm is an unsupervised machine learning algorithm that partitions data into a predetermined number of clusters. It initializes with random centroids and iterates until convergence, minimizing within-cluster variance. The provided python code demonstrates an implementation of this algorithm.

Step-by-step explanation:

Explanation of the k-means Clustering Algorithm

k-means clustering is a popular unsupervised machine learning algorithm used to classify unlabeled data into a pre-defined number of clusters, where 'k' represents the number of clusters. The algorithm iteratively assigns each data point to the nearest cluster based on the mean of the points in the cluster, hence the name k-means. The goal is to minimize the within-cluster variance.

Pseudo Code of k-means Clustering Algorithm

  1. Initialize k cluster centroids randomly.
  2. Repeat until convergence:
  3. Assign each data point to the nearest centroid based on the Euclidean distance.

Update the centroid of each cluster to be the mean of the points assigned to it.
The algorithm stops when the assignments do not change between iterations, or a predefined number of iterations is reached.

Python Implementation of k-means Clustering Algorithm
import numpy as np

# K-means clustering implementation
def k_means(X, k, max_iters=100, random_seed=None):
np.random.seed(random_seed)
centroids = X[np.random.choice(len(X), k, replace=False)]
for i in range(max_iters):
labels = np.argmin(np.linalg.norm(X[:, np.newaxis]-centroids, axis=2), axis=1)
new_centroids = np.array([X[labels == j].mean(axis=0) for j in range(k)])
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return labels, centroids

# Example of using the k-means function with random initialization
# Assuming X is a 2-dimensional numpy array of data points
labels, cluster_centers = k_means(X, k=3, random_seed=42)

User Andrekos
by
7.8k points