K-means algorithm

Program's name: K-means algorithm
Available version(s): K-means (multi-version) ·
Programming language(s): C ·
Programming model(s): OpenMP ·

This repository contains an implentation of the k-means algorithm. K-means is an unsupervised machine learning algorithm used to cluster data. Its primary purpose is to partition a set of data points into k distinct, non-overlapping subsets (or clusters) based on their similarity.

In HPC environments, the k-means algorithm is employed to perform data-mining on large data sets, e.g. gene expression analysis in bioinformatics.

The k-means algorithm consists of the following steps:

  • Initialization: Choose k random initial cluster centroids.

  • Assignment: Assign each data point to their nearest centroid.

  • Centroid Update: Update each centroid’s position based on the mean of all data points assigned to it.

  • Convergence Check: Stop if the centroid’s position change was below a threshold or a maximum iteration count is reached.

Building and running the program

Usually the application builds by:

$ make

And it can run by selecting the pre-configured input size:

$ make run-small