SVD_BASIS
Extract singular vectors from data
SVD_BASIS is an executable C++ program, using double precision
arithmetic, that applies the singular value decomposition to
a set of data vectors, to extract the leading "modes" of the data.
This program is intended as an intermediate application, in
the following situation:
-
a "high fidelity" or "high resolution" PDE solver is used
to determine many (say N = 500) solutions of a discretized
PDE at various times, or parameter values. Each solution
may be regarded as an M vector. Typically, each solution
involves an M by M linear system, greatly reduced in
complexity because of bandedness or sparsity.
-
This program is applied to extract L dominant modes from
the N solutions. This is done using the singular value
decomposition of the M by N matrix, each of whose columns
is one of the original solution vectors.
-
a "reduced order model" program may then attempt to solve
a discretized version of the PDE, using the L dominant
modes as basis vectors. Typically, this means that a dense
L byL linear system will be involved.
Thus, the program might read in 500 files, and write out
5 or 10 files of the corresponding size and "shape", representing
the dominant solution modes.
To compute the singular value decomposition, we first construct
the M by N matrix A using individual solution vectors
as columns:
A = [ X1 | X2 | ... | XN ]
The singular value decomposition has the form:
A = U * S * V'
and is determined using the DSVDC routine from the linear algebra
package LINPACK.
The leading L columns of the orthogonal M by M
matrix U, associated with the largest singular values S,
are chosen to form the basis.
In most PDE's, the solution vector has some structure; perhaps
there are 100 nodes, and at each node the solution has perhaps
4 components (horizontal and vertical velocity, pressure, and
temperature, say). While the solution is therefore a vector
of length 400, it's more natural to think of it as a sort of
table of 100 items, each with 4 components. You can use that
idea to organize your solution data files; in other words, your
data files can each have 100 lines, containing 4 values on each line.
As long as every line has the same number of values, and every
data file has the same form, the program can figure out what's
going on.
The program assumes that each solution vector is stored in a separate
data file and that the files are numbered consecutively, such as
data01.txt, data02,txt, ... In a data file, comments
(beginning with '#") and blank lines are allowed. Except for
comment lines, each line of the file is assumed to represent all
the component values of the solution at a particular node.
Here, for instance, is a tiny data file for a problem with just
3 nodes, and 4 solution components at each node:
# This is solution file number 1
#
1 2 3 4
5 6 7 8
9 10 11 12
The program is interactive, but requires only a very small
amound of input:
-
L, the number of basis vectors to be extracted from the data;
-
the name of the first input data file in the first set.
-
the name of the first input data file in the second set, if any.
(you are allowed to define a master data set composed of several
groups of files, each consisting of a sequence of consecutive
file names)
-
a BLANK line, when there are no more sets of data to be added.
-
"Y" if the output files may include some initial comment lines,
which will be indicated by initial "#" characters.
The program computes L basis vectors,
and writes each one to a separate file, starting with svd_001.txt,
svd_002.txt and so on. The basis vectors are written with the
same component and node structure that was encountered on the
solution files. Each vector will have unit Euclidean norm.
Related Data and Programs:
BLAS1
is a C++ library containing an implementation of the
Level 1 Basic Linear Algebra Subprograms,
which are used by this program. To build a copy of SVD_BASIS
requires access to a compiled copy of the BLAS1 library.
BURGERS
is a dataset directory which contains a set of 40 successive
solutions to the Burgers equation. This data can be analyzed
using SVD_BASIS.
LINPACK
is a C++ linear algebra library which
supplies the routine DSVDC, needed by this program.
To build a copy of SVD_BASIS requires access to a compiled
copy of the LINPACK library.
SVD_BASIS is also available in
a FORTRAN90 version and
a MATLAB version.
SVD_DEMO_WEIGHT
is an executable FORTRAN90 program which
is similar to SVD_BASIS, but which allows the user to
assign weights to each data vector.
SVD_DEMO
is an executable C++ program which demonstrates
the singular value decomposition for a simple example.
TABLE
is a file format which is used to store the input and output files
used by the program.
Reference:
-
Edward Anderson, Zhaojun Bai, Christian Bischof, Susan Blackford,
James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum,
Sven Hammarling, Alan McKenney, Danny Sorensen,
LAPACK User's Guide,
Third Edition,
SIAM, 1999,
ISBN: 0898714478,
LC: QA76.73.F25L36
-
Gal Berkooz, Philip Holmes, John Lumley,
The proper orthogonal decomposition in the analysis
of turbulent flows,
Annual Review of Fluid Mechanics,
Volume 25, 1993, pages 539-575.
-
John Burkardt, Max Gunzburger, Hyung-Chun Lee,
Centroidal Voronoi Tessellation-Based Reduced-Order
Modelling of Complex Systems,
SIAM Journal on Scientific Computing,
Volume 28, Number 2, 2006, pages 459-484.
-
Lawrence Sirovich,
Turbulence and the dynamics of coherent structures, Parts I-III,
Quarterly of Applied Mathematics,
Volume XLV, Number 3, 1987, pages 561-590.
Source Code:
Examples and Tests:
List of Routines:
-
MAIN is the main program for SVD_BASIS.
-
BASIS_WRITE writes a basis vector to a file.
-
CH_CAP capitalizes a single character.
-
CH_EQI is true if two characters are equal, disregarding case.
-
CH_IS_DIGIT returns TRUE if a character is a decimal digit.
-
CH_TO_DIGIT returns the integer value of a base 10 digit.
-
DIGIT_INC increments a decimal digit.
-
DIGIT_TO_CH returns the base 10 digit character corresponding to a digit.
-
FILE_COLUMN_COUNT counts the number of columns in the first line of a file.
-
FILE_EXIST reports whether a file exists.
-
FILE_NAME_INC generates the next file name in a series.
-
FILE_ROW_COUNT counts the number of row records in a file.
-
I4_HUGE returns a "huge" I4 value.
-
I4_INPUT prints a prompt string and reads an integer from the user.
-
R8_EPSILON returns the roundoff unit for R8 arithmetic.
-
R8MAT_PRINT prints an R8MAT, with an optional title.
-
R8MAT_PRINT_SOME prints some of an R8MAT.
-
R8TABLE_DATA_READ reads the data from an R8TABLE file.
-
R8TABLE_HEADER_READ reads the header from an R8TABLE file.
-
S_LEN_TRIM returns the length of a string to the last nonblank.
-
S_TO_R8 reads an R8 from a string.
-
S_TO_R8VEC reads an R8VEC from a string.
-
S_TO_I4 reads an I4 from a string.
-
S_WORD_COUNT counts the number of "words" in a string.
-
SINGULAR_VECTORS computes the desired singular values.
-
TIMESTAMP prints the current YMDHMS date as a time stamp.
-
TIMESTRING returns the current YMDHMS date as a string.
You can go up one level to
the C++ source codes.
Last revised on 22 September 2005.