K-means (multi-version)

Version's name: K-means (multi-version) ; a version of the K-means algorithm program.
Repository: [home] and version downloads: [.zip] [.tar.gz] [.tar.bz2] [.tar]

This version of the code implements, via conditional compilation, multiple versions of the k-means algorithm to explore the vector opportunities each flavour may offer.

Pre-requisites

To build and execute the progam, you will need a c++ compiler and a python interpreter. By default the Makefile is configures to execute clang++, but you can easily change the compiler by properly setting the Makefile’s variable CXX.

Building the program

There are multiple versions implemented showing different behavior that can be controlled through preprocessor variables at compile time:

  • Extended Data Structure: The 2D point structure can be extended by an assignment variable indicating the centroid id to which the data point has been assigned. If not enabled, a separate array is used for assignments.

  • Assignment Fission: This option separates the cluster assignment step from the minimum distance loop. This approach enhances vectorization.

  • Nested Centroid Update: The first version updates all centroids simultaneously by iterating over all data points once and calculating the mean for all centroids at the same time. The second version iterates over all data points once for each centroid, enabling better vectorization.

  • Convergence Check: The convergence check can be switched off to force maximum iteration count.

Once the building options are properly set, run:

$ make

That command will generate the kmens.exe program.

Generating additional input file

$python input/gen_input.py --file input/test.in --k 5 --n 1000 --dim_x 100 --dim_y 100
Generated 1000 points around 5 centroids. Written to input/test.in.

Running the program

In order to run the program, you can directly execute the command line manually providing the required paramenters:

OMP_NUM_THREADS=1 ./kmeans.exe input/small.in  <DIM> <NPOINTS> <NCENTERS> <NITERS>

For example:

OMP_NUM_THREADS=1 ./kmeans.exe input/small.in 100 1000 5 20

Or you can rely on the pre-configured small, mid, and large workloads, example:

$ make run-small
The following experiments have been registered: