This repository contains an implentation of the k-means algorithm. K-means is
an unsupervised machine learning algorithm used to cluster data. Its primary
purpose is to partition a set of data points into k distinct, non-overlapping
subsets (or clusters) based on their similarity.
In HPC environments, the k-means algorithm is employed to perform data-mining on large data sets, e.g. gene expression analysis in bioinformatics.
The k-means algorithm consists of the following steps:
Initialization: Choose k random initial cluster centroids.
Assignment: Assign each data point to their nearest centroid.
Centroid Update: Update each centroid’s position based on the mean of all data points assigned to it.
Convergence Check: Stop if the centroid’s position change was below a threshold or a maximum iteration count is reached.
Usually the application builds by:
$ make
And it can run by selecting the pre-configured input size:
$ make run-small