CVT_BASIS_FLOW
PDE Model Reduction by Voronoi Techniques
CVT_BASIS_FLOW is a FORTRAN90 program,
using double precision arithmetic,
which extracts representative solution
modes of a set of solutions to a fluid flow PDE.
The selection process uses K-Means clustering, which can be considered
to be a discrete version of the CVT algorithm (Centroidal Voronoi
Tessellation).
The selected modes will generally be "well spread out" in the space
spanned by the set of solutions. Such a set of modes might be useful
as a basis for a low-dimensional approximation of new solutions,
as long as it may be assumed that these new solutions do not
have significant components that were not evident
in the original solution data.
Specifically, a partial differential equation (PDE) has been
defined, specifying the time dependent flow of a fluid through
a region. The PDE specification includes a parameter ALPHA
whose value strongly affects the behavior of the flow. The
steady state solution S0 is computed for a particular value
of ALPHA. Then the time dependent problem is solved over a
fixed time interval, with ALPHA varying from time to time.
A set of several hundred solutions S(T(I),ALPHA(I)) are saved.
The need is to try to extract from this solution data the
typical modes of behavior of the solution. Such a set of modes
may then be used as a finite element basis that is highly tuned
to the physics of the problem, so that a very small set of
basis functions can be used to closely approximate the behavior
of the solution over a range of values of ALPHA.
The method of extracting information from the solution data
uses a form of K-Means clustering.
The program will try to cluster the data, that is, to organize
the data by defining a number of cluster centers, which are
also points in N dimensional space, and assigning each record
to the cluster associated with a particular center.
The method of assigning data aims to minimize the cluster energy,
which is taken to be the sum of the squares of the distances of
each data point from its cluster center.
In some contexts, it makes sense to use the usual Euclidean sort
of distance. In others, it may make more sense to replace each
data record by a normalized version, and to assign distance
by computing angles between the unit vectors.
Because the data comes from a finite element computation, and
the results may be used as a new reduced basis, it may be
desirable to carry out mass matrix preconditioning of the data,
so that output vectors (cluster generators) are pairwise orthogonal
in the L2 inner product (integration of the product of the finite
element functions over the domain).
Because the results may be used as a new reduced basis, it may be
desirable, once the results have been computed, to apply a
Gram-Schmidt orthogonalization procedure, so that the basis
vectors have unit Euclidean norm, and are pairwise orthogonal.
The current version of the program assumes that a steady state
solution SS of the PDE is known, and that a multiple
of SS is to be subtracted from each solution vector before processing.
FILES: the program assumes the existence of the following files:
(the actual names of the files are specified by the user at run time.
The names used here are just suggestions.)
-
xy.txt, contains the coordinates of each node, with
one pair of coordinates per line of the file;
-
ss.txt, contains the steady state solution values at each
node; normally, there are two values per node (horizontal and
vertical velocity). However, the program will accept data
that is scalar, or with a higher number of components than 2.
Most of the ensuing discussion assumes that the number of
components is 2, but that's just because that is the problem
we are usually working on;
-
uv01.txt, uv02.txt, ..., contains the solution values
at each node for solution 1, 2, and so on; the number of components
(normally 2) must be the same as for the steady state solution
file.
-
element.txt, contains the indices of the six nodes that
make up each element, with one set of six indices per line of
the file (only needed if mass matrix
preconditioning is used);
INPUT: at run time, the user specifies:
-
run_type describes how we subtract off the steady state,
whether we drop some data, and other options. The current
values range from 1 to 8. The most common value is 6, used
with the TCELL data:
-
no steady state file is used, no preprocessing is carried out;
-
no steady state file is used, no preprocessing is carried out;
-
subtract 1/3 SS from solution 1, 5/3 SS from solutions
2 to 201, and 1/3 SS from solutions 202 through 401.
-
subtract 1/3 SS from solution 1, 5/3 SS from solutions
2 to 201, and 1/3 SS from solutions 202 through 401,
and drop the even-numbered data.
-
subtract 1/3 SS from solution 1, 5/3 SS from solutions
2 to 201, and 1/3 SS from solutions 202 through 401,
and skip half the data and normalize it.
-
subtract 5/3 SS from solutions
1 to 250, and 1/3 SS from solutions 251 through 500, do not
normalize.
-
subtract 5/3 SS from solutions
1 to 250, and 1/3 SS from solutions 251 through 500,
normalize the data.
-
subtract 5/3 SS from solutions
1 to 250, and 1/3 SS from solutions 251 through 500, then
drop the odd-numbered data, do not
normalize.
-
xy_file, the name of the xy file containing the
node coordinates;
-
steady_file, the name of the steady state solution file,
or "none" if the data does not need to be preprocessed (run_type
1 or 2);
-
uv0_file, the name of the first solution file (the program
will assume all the files are numbered consecutively).
The code has been modified so that you may now specify more
than one set of solution families. Enter "none" if there are
no more families, or else the name of the first file in the
next family. Up to 10 separate families of files are allowed.
-
cluster_lo, cluster_hi, the range of cluster sizes to check.
In most cases, you simply want to specify the same number
for both these values, namely, the requested basis size.
-
cluster_it_max, the number of different times you want to
try to cluster the data; I often use 15.
-
energy_it_max, the number of times you want to try to improve
a given clustering by swapping points from one cluster to another;
I often use 50 or 100.
-
element_file, the name of the element file, if mass matrix
preconditioning is desired, or else "none".
-
normal, 0 to use raw data, 1 to normalize; here, after
we have subtracted the steady state and preconditioned the data
vectors, we are offering also to make each data vector have
unit norm before clustering. At the moment, I'm working with
the raw data.
-
comment, "Y" if initial comments may be included in the
beginning of the output files. These comments always start with
a "#" character in column 1.
OUTPUT: the program computes basis_num basis vectors.
The first vector is written to the file gen_001.txt; again,
the output vectors are written with two values per line, since
this represents the two components of velocity at a particular
node.
-
Linkage:
-
The program calls numerous LAPACK routines for the processing
of the mass matrix. The text for these routines is not included
in the source code. The compiled program must be linked to
the LAPACK library.
Related Data and Programs:
CVT_BASIS
is a FORTRAN90 program which
is similar to CVT_BASIS_FLOW, but handles any general
set of data vectors.
POD_BASIS_FLOW
is a FORTRAN90 program which
is similar to CVT_BASIS_FLOW,
but uses POD methods to extract representative modes from the data.
Reference:
-
Franz Aurenhammer,
Voronoi diagrams -
a study of a fundamental geometric data structure,
ACM Computing Surveys,
Volume 23, Number 3, pages 345-405, September 1991,
../../pdf/aurenhammer.pdf
-
John Burkardt, Max Gunzburger, Hyung-Chun Lee,
Centroidal Voronoi Tessellation-Based Reduced-Order
Modelling of Complex Systems,
SIAM Journal on Scientific Computing,
Volume 28, Number 2, 2006, pages 459-484.
-
John Burkardt, Max Gunzburger, Janet Peterson and Rebecca Brannon,
User Manual and Supporting Information for Library of Codes
for Centroidal Voronoi Placement and Associated Zeroth,
First, and Second Moment Determination,
Sandia National Laboratories Technical Report SAND2002-0099,
February 2002,
../../publications/bgpb_2002.pdf
-
Qiang Du, Vance Faber, Max Gunzburger,
Centroidal Voronoi Tessellations: Applications and Algorithms,
SIAM Review, Volume 41, 1999, pages 637-676.
-
Lili Ju, Qiang Du, Max Gunzburger,
Probabilistic methods for centroidal Voronoi tessellations
and their parallel implementations,
Parallel Computing,
Volume 28, 2002, pages 1477-1500.
-
Wendy Martinez, Angel Martinez,
Computational Statistics Handbook with MATLAB,
Chapman and Hall / CRC, 2002.
Source Code:
Examples and Tests:
PDE solution datasets you may copy include:
-
CAVITY, the driven cavity;
-
INOUT, flow in and out of a chamber;
-
INOUT #2, flow in and out of a chamber, using a finer grid
and more timesteps;
-
TCELL, flow through a T-cell;
This program has been run with a number of different datasets,
and with various requirements as to normalization and so on.
The purpose of most of the runs is to find a generator set of
given size. The input and output of each run is stored in
a separate subdirectory.
Now we worked with 500 flow solutions in the TCELL region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We DON'T normalize the PDE solutions.
The next set of runs worked with 500 flow solutions in the TCELL region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. Now we NORMALIZE the PDE solutions before processing them.
The next set of runs worked with 500 flow solutions in the TCELL region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We DON'T normalize the PDE solutions. We discard
half the data, keeping the EVEN steps, 2, 4, ..., 500.
The next set of runs works with 500 flow solutions in the INOUT region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We DON'T normalize the PDE solutions.
The next set of runs works with 500 flow solutions in the INOUT region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We NORMALIZE the PDE solutions.
The next set of runs works with 500 flow solutions in the INOUT region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We DON'T normalize the PDE solutions. Before
we proceed, we DROP the ODD numbered PDE solutions
The next set of runs works with 500 flow solutions in the CAVITY region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We DON'T normalize the PDE solutions.
The next set of runs works with 500 flow solutions in the CAVITY region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We NORMALIZE the PDE solutions.
The next set of runs works with 500 flow solutions in the CAVITY region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We DON'T normalize the PDE solutions. Before
we proceed, we DROP the ODD numbered PDE solutions
The next set of runs works with 500 flow solutions in the CAVITY region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We normalize the PDE solutions. We use MASS MATRIX
preconditioning.
The next set of runs works with 500 flow solutions in the INOUT region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We normalize the PDE solutions. We use MASS MATRIX
preconditioning.
The next set of runs works with 500 flow solutions in the TCELL region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We normalize the PDE solutions. We use MASS MATRIX
preconditioning.
The next set of runs works with 500 flow solutions in the CAVITY region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We do not normalize the PDE solutions. We use MASS MATRIX
preconditioning.
The next set of runs works with 500 flow solutions in the INOUT region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We do not normalize the PDE solutions. We use MASS MATRIX
preconditioning.
The next set of runs works with 500 flow solutions in the TCELL region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We do not normalize the PDE solutions. We use MASS MATRIX
preconditioning.
-
run 64, 4 elements;
-
run 78, 5 elements;
-
run 81, 6 elements;
-
run 79, 7 elements;
-
run 65, 8 elements;
-
run 82, 9 elements;
-
run 80, 10 elements;
-
run 83, 11 elements;
-
run 84, 12 elements;
-
run 85, 13 elements;
-
run 86, 14 elements;
-
run 87, 15 elements;
-
run 66, 16 elements;
-
run 88, 17 elements;
-
run 89, 18 elements;
-
run 90, 19 elements;
-
run 91, 20 elements;
The next set of runs works with 500 flow solutions in the CAVITY region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We do not normalize the PDE solutions. We drop the
odd numbered data vectors. We use MASS MATRIX preconditioning.
The next set of runs works with 500 flow solutions in the INOUT region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We do not normalize the PDE solutions. We drop the
odd numbered data vectors. We use MASS MATRIX preconditioning.
The next set of runs works with 500 flow solutions in the TCELL region.
We subtract 5/3 of steady solution from 1-250, and 1/3 from 251
through 500. We do not normalize the PDE solutions. We drop the
odd numbered data vectors. We use MASS MATRIX preconditioning.
The next set of runs works with 800 flow solutions in the INOUT2 region.
We subtract 5/3 of steady solution from 1-400, and 1/3 from 401
through 800. We DON'T normalize the PDE solutions.
The next set of runs works with 800 flow solutions in the INOUT2 region.
We subtract 5/3 of steady solution from 1-400, and 1/3 from 401
through 800. We DON'T normalize the PDE solutions.
We use mass matrix preconditioning.
The next set of runs works with 40 scalar flow solutions in the
one-dimensional BURGERS equation.
List of Routines:
-
MAIN is the main routine for the CVT_BASIS_FLOW program.
-
ANALYSIS_NORMAL computes the energy for a range of number of clusters.
-
ANALYSIS_RAW computes the energy for a range of number of clusters.
-
BANDWIDTH_DETERMINE computes the lower bandwidth of a finite element matrix.
-
CH_CAP capitalizes a single character.
-
CH_EQI is a case insensitive comparison of two characters for equality.
-
CH_IS_DIGIT returns .TRUE. if a character is a decimal digit.
-
CH_TO_DIGIT returns the integer value of a base 10 digit.
-
CLUSTER_CENSUS computes and prints the population of each cluster.
-
CLUSTER_INITIALIZE_RAW initializes the cluster centers to random values.
-
CLUSTER_LIST prints out the assignments.
-
DATA_TO_GNUPLOT writes data to a file suitable for processing by GNUPLOT.
-
DIGIT_INC increments a decimal digit.
-
DIGIT_TO_CH returns the character representation of a decimal digit.
-
DISTANCE_NORMAL_SQ computes the distance between normalized vectors.
-
DTABLE_DATA_READ reads data from a double precision table file.
-
DTABLE_DATA_WRITE writes data to a double precision table file.
-
DTABLE_HEADER_READ reads the header from a double precision table file.
-
DTABLE_HEADER_WRITE writes the header to a double precision table file.
-
DTABLE_WRITE writes a double precision table file.
-
ENERGY_NORMAL computes the total energy of a given clustering.
-
ENERGY_RAW computes the total energy of a given clustering.
-
FILE_COLUMN_COUNT counts the number of columns in the first line of a file.
-
FILE_EXIST reports whether a file exists.
-
FILE_NAME_INC generates the next filename in a series.
-
FILE_ROW_COUNT counts the number of row records in a file.
-
GET_UNIT returns a free FORTRAN unit number.
-
HMEANS_NORMAL seeks the minimal energy of a cluster of a given size.
-
HMEANS_RAW seeks the minimal energy of a cluster of a given size.
-
I4_INPUT prints a prompt string and reads an integer from the user.
-
I4_RANGE_INPUT reads a pair of integers from the user, representing a range.
-
I4_UNIFORM returns a scaled pseudorandom I4.
-
ITABLE_DATA_READ reads data from an integer table file.
-
ITABLE_HEADER_READ reads the header from an integer table file.
-
I4VEC_PRINT prints an integer vector.
-
KMEANS_NORMAL tries to improve a partition of points.
-
KMEANS_RAW tries to improve a partition of points.
-
MASS_MATRIX computes the mass matrix.
-
NEAREST_CLUSTER_NORMAL finds the cluster nearest to a data point.
-
NEAREST_CLUSTER_RAW finds the cluster nearest to a data point.
-
NEAREST_POINT finds the center point nearest a data point.
-
POINT_GENERATE generates data points for the problem.
-
POINT_PRINT prints out the values of the data points.
-
R8VEC_NORM2 returns the 2-norm of a vector.
-
R8VEC_RANGE_INPUT reads two DP vectors from the user, representing a range.
-
R8VEC_UNIT_EUCLIDEAN normalizes a N-vector in the Euclidean norm.
-
RANDOM_INITIALIZE initializes the FORTRAN 90 random number seed.
-
REFQBF evaluates a reference element quadratic basis function.
-
S_BLANK_DELETE removes blanks from a string, left justifying the remainder.
-
S_EQI is a case insensitive comparison of two strings for equality.
-
S_INPUT prints a prompt string and reads a string from the user.
-
S_OF_I4 converts an integer to a left-justified string.
-
S_REP_CH replaces all occurrences of one character by another.
-
S_TO_I4 reads an I4 from a string.
-
S_TO_I4VEC reads an integer vector from a string.
-
S_TO_R8 reads an R8 from a string.
-
S_TO_R8VEC reads an R8VEC from a string.
-
S_WORD_COUNT counts the number of "words" in a string.
-
TIMESTAMP prints the current YMDHMS date as a time stamp.
-
TIMESTRING writes the current YMDHMS date into a string.
-
TRIANGLE_UNIT_SET sets a quadrature rule in a unit triangle.
You can go up one level to
the FORTRAN90 source codes.
Last revised on 12 November 2006.