ASA136 is a FORTRAN90 library, using double precision arithmetic, which divides M points in N dimensions into K clusters so that the within-clusters sum of squares is minimized.
ASA136 is Applied Statistics Algorithm 136. Source code for many Applied Statistics Algorithms is available through STATLIB.
In the K-Means problem, a set of N points X(I) in M-dimensions is given. The goal is to arrange these points into K clusters, with each cluster having a representative point Z(J), usually chosen as the centroid of the points in the cluster. The energy of each cluster is
E(J) = Sum ( all points X(I) in cluster J ) || X(I) - Z(J) ||^2
For a given set of clusters, the total energy is then simply the sum of the cluster energies E(J). The goal is to choose the clusters in such a way that the total energy is minimized. Usually, a point X(I) goes into the cluster with the closest representative point Z(J). So to define the clusters, it's enough simply to specify the locations of the cluster representatives.
This is actually a fairly hard problem. Most algorithms do reasonably well, but cannot guarantee that the best solution has been found. It is very common for algorithms to get stuck at a solution which is merely a "local minimum". For such a local minimum, every slight rearrangement of the solution makes the energy go up; however a major rearrangement would result in a big drop in energy.
A simple algorithm for the problem is known as "H-Means". It alternates between two procedures:
A more sophisticated algorithm, known as "K-Means", takes advantage of the fact that it is possible to quickly determine the decrease in energy caused by moving a point from its current cluster to another. It repeats the following procedure:
ASA058 is a FORTRAN90 library which carries out the K-means algorithm for clustering data.
ASA113 is a FORTRAN90 library which implements the Banfield and Bassill clustering algorithm using transfers and swaps.
ASA136 is also available in a C++ version and a FORTRAN77 version and a MATLAB version.
CITIES is a FORTRAN90 library which handles various problems associated with a set of "cities" on a map.
CITIES is a dataset directory which contains a number of city distance datasets.
KMEANS is a FORTRAN90 library which contains several implementations of the H-Means and K-Means clustering algorithms.
LAU_NP is a FORTRAN90 library which contains heuristic algorithms for the K-center and K-median problems.
SPAETH is a FORTRAN90 library which clusters data according to various principles.
SPAETH is a dataset directory which contains test data for clustering.
SPAETH2 is a FORTRAN90 library which can cluster data according to various principles.
SPAETH2 is a dataset directory which contains test data for clustering.
You can go up one level to the FORTRAN90 source codes.