CVT_BASIS
Data Clustering by K-Means Techniques


CVT_BASIS is a FORTRAN90 program, using double precision arithmetic, which computes good cluster centers for a set of data.

The clustering process uses the K-Means algorithm, which can be considered to be a discrete version of the CVT algorithm (Centroidal Voronoi Tessellation).

The data is a collection of vectors, with each vector stored in a separate file. The files are presumed to have "sequential" names, such as "fred01.txt", "fred02.txt", and so on. Each file must be a TABLE file, that is a series of N lines, with M values on every line (although comment lines may be inserted as well.)

The program is given the name of the first file in the sequence. It reads the data from each file in the sequence, and carries out the K Means clustering process to determine K cluster centers. It writes each of these cluster centers out to a separate file.

The cluster centers will generally be "well spread out" in the space spanned by the set of data. Such a set might be useful, for instance, in determining a basis for a low-dimensional approximation of the data.

INPUT: at run time, the user specifies:

Related Data and Programs:

BURGERS is a data set directory which contains solutions of the 1 dimensional Burgers equation;

CAVITY_FLOW is a dataset directory which contains solutions of a driven cavity flow in 2D;

CVT_BASIS_FLOW is a FORTRAN90 program which is similar to CVT_BASIS, but is specialized to handle a particular family of fluid flow solutions.

INOUT_FLOW is a dataset directory which contains solutions for flow in and out of a chamber in 2D;

INOUT_FLOW2 is a dataset directory which contains solutions for flow in and out of a chamber in 2D, using a finer grid and more timesteps;

SVD_BASIS is a FORTRAN90 program which uses the singular value decomposition to extract representative modes from a set of data vectors.

TABLE is a file format used for the input and output files.

TCELL_FLOW is a dataset directory which contains solutions for flow through a T-cell in 2D;

Reference:

  1. Franz Aurenhammer,
    Voronoi diagrams - a study of a fundamental geometric data structure,
    ACM Computing Surveys,
    Volume 23, Number 3, pages 345-405, September 1991,
    ../../pdf/aurenhammer.pdf
  2. John Burkardt, Max Gunzburger, Hyung-Chun Lee,
    Centroidal Voronoi Tessellation-Based Reduced-Order Modelling of Complex Systems,
    SIAM Journal on Scientific Computing,
    Volume 28, Number 2, 2006, pages 459-484.
  3. John Burkardt, Max Gunzburger, Janet Peterson and Rebecca Brannon,
    User Manual and Supporting Information for Library of Codes for Centroidal Voronoi Placement and Associated Zeroth, First, and Second Moment Determination,
    Sandia National Laboratories Technical Report SAND2002-0099,
    February 2002,
    ../../publications/bgpb_2002.pdf
  4. Qiang Du, Vance Faber, and Max Gunzburger,
    Centroidal Voronoi Tessellations: Applications and Algorithms,
    SIAM Review, Volume 41, 1999, pages 637-676.
  5. Lili Ju, Qiang Du, and Max Gunzburger,
    Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations,
    Parallel Computing,
    Volume 28, 2002, pages 1477-1500.
  6. Wendy Martinez and Angel Martinez,
    Computational Statistics Handbook with MATLAB,
    Chapman and Hall / CRC, 2002.

Source Code:

Examples and Tests:

List of Routines:

You can go up one level to the FORTRAN90 source codes.


Last revised on 12 November 2006.