Linear Algebra Glossary


This file defines common terms from linear algebra.

TABLE OF CONTENTS

A-Orthogonal Vectors

Two vectors u and v are said to be A-orthogonal if

( u, A * v ) = 0.

Here A should be a positive definite symmetric matrix, which in turn guarantees that the expression ( u, A * v ) may be regarded as an inner product of the vectors u and v, with the usual properties.

This concept is useful in the analysis of the conjugate gradient method.

Back to TABLE OF CONTENTS.

Adjacency Matrix

An adjacency matrix of an (undirected) graph is a matrix whose order is the number of nodes, and whose entries record which nodes are connected to each other by a link or edge of the graph.

If two nodes I and J are connected by an edge, then Ai,j=Aj,i=1. All other entries of the matrix are 0. Thus, an adjacency matrix is a zero-one matrix. The usual convention is that a node is not connected to itself, and hence the diagonal of the matrix is zero.

The product A2=A*A is a matrix which records the number of paths between nodes I and J. If it is possible to reach one node from another, it must be possible in a path of no more than n-1 links. Hence, the reachability matrix, which records whether it is possible to get from node I to node J in one or more steps, can be determined by taking the logical sum of the matrices I, A, A2, ..., An-1.

(Loops:) If edges are allowed which begin and end at the same node, (sometimes called loops) then the adjacency matrix can have some 1's on the diagonal.

(Directed graphs:) If edges are assigned a direction, then it is possible that there is an edge from node I to node J, but not in the reverse direrection. In this case of a "directed graph", an adjacency matrix can be defined as well. The only difference is that the adjacency matrix now need not be symmetric. The reachability matrix can still be defined in the same way.

(Multigraphs:) If multiple edges are allowed between two nodes, (which might be distinguished by color in a drawing), then the adjacency matrix can record this information by storing in Ai,j the number of edges between the two nodes. The reachability matrix can still be defined in the same way, and still counts the number of different routes correctly.

Back to TABLE OF CONTENTS.

Adjoint Matrix

The adjoint matrix of a square matrix A has the property that:

A * adjoint ( A ) = adjoint ( A ) * A = det(A) * I.

Note that, if A is not invertible, then det(A)=0. We can take the convention that in this case, the adjoint of the matrix is the zero matrix.

The adjoint of a matrix A is "almost" its inverse. If we know that A is invertible, then the inverse of A, denoted inverse ( A ) or A-1, can be written explicitly as:

A-1 = ( 1 / det(A) ) * adjoint ( A ).

This formula is only of use if we can figure out how to evaluate the things on the right hand side. The first factor is not so bad, since we do know a formula for the determinant of a matrix.

But there is also a way of evaluating the second factor. The adjoint matrix is defined in terms of the cofactor matrix of A:

adjoint ( A ) = ( cofactor ( A ) )'.

Back to TABLE OF CONTENTS.

Alternating Sign Matrix

An alternating sign matrix is an integer matrix with the properties that

Example:

         0   1   0   0   0
         1  -1   0   1   0
         0   1   0  -1   1
         0   0   0   1   0
         0   0   1   0   0
      

Obviously, alternating sign matrices include the identity matrix and any permutation matrix. From the definitions, you should see that the first row must contain a single entry which is 1, with the other values being 0. The nonzeroes in the second row can be a single 1, or the values, 1, -1, 1, in that order, with some intervening zeroes possible. In the third row, the value -1 may occur up to as many times as there were 1's in preceding rows, which means the most interesting row could be 1, -1, 1, -1, 1, -1, 1. Thus the number of possible nonzeroes grows until the central row of the matrix is reached. Since the same restrictions apply from the bottom reading up, the number of possible nonzeroes must now decrease. Similar reasoning controls the nonzero population of the columns.

If we let An denote the number of distinct alternating sign matrices of order n, then it has only recently been proved that

An = Product ( 0 <= I <= N-1 ) (3*I+1)! / (N+I)!
giving the sequence 1, 2, 7, 42, 429, 7436, 218348, 10850216, ...

Reference:

  1. David Robbins,
    The Story of 1, 2, 7, 42, 429, 7436, ...,
    Mathematical Intelligencer,
    Volume 13, Number 2, pages 12-19.
  2. David Bressoud,
    Proofs and Confirmations: The Story of the Alternating Sign Matrix Conjecture,
    Cambridge University Press, 1999.

Back to TABLE OF CONTENTS.

Anticirculant Matrix

An anticirculant matrix is a matrix whose first row of values is repeated in each successive row, shifted one position to the left, with the first value "wrapping around" to the end.

Here is an example of an anticirculant matrix:

        1 2 3 4
        2 3 4 1
        3 4 1 2
        4 1 2 3
      
and here is an example of a rectangular anticirculant matrix:
        1 2 3 4 5
        2 3 4 5 1
        3 4 5 1 2
      

Simple facts about a anticirculant matrix A:

Back to TABLE OF CONTENTS.

Antisymmetric Matrix

A square matrix A is antisymmetric if it is equal to the negative of its transpose:

A = - A'.

Every matrix A can be decomposed into the sum of an antisymmetric and a symmetric matrix:

A = B + C = (1/2) * ( ( A - A' ) + ( A + A' ) )

Simple facts about an antisymmetric matrix A:

An antisymmetric matrix is also called skew symmetric.

In complex arithmetic, the corresponding object is a skew Hermitian matrix.

Back to TABLE OF CONTENTS.

Arnoldi Iteration

The Arnoldi iteration is an iterative method that seeks to approximate some of the eigenvalues of a very large matrix.

If the matrix is symmetric or Hermitian, then the Arnoldi iteration is usually reformulated into the more efficient Lanczos iteration.

We start by considering the unitary similarity transform that converts the m by m matrix A into upper Hessenberg form:


        A = Q H Q*
      
Now we let Qn indicate the m by n matrix formed from the first n columns of Q; similarly, Hn is the n+1 by n+1 submatrix of H. Then we write

        A Qn = Qn+1 Hn
      
In particular, we can look at the subsystem involving the n-th column of Qn:

        A*qn = h1,n q1 + ... 
          + hn,n qn
          + hn+1,n qn+1
      

This can be interpreted as a recursive relationship for qn+1 in terms of the previous vectors. The Arnoldi iteration allows us to build up the matrices Qn and Hn. For reasons not completely understood, even for n much smaller than m, the eigenvalues of Hn will tend to some of the eigenvalues of A, and typically the extreme ones.

The new vector qn+1 is a Krylov vector; actually, because of orthogonalization, it represents the new component introduced into the Krylov subspace by the product A*qn.

Thus, a typical procedure is to compute Hn and apply a standard eigenvalue procedure for Hessenberg matrices to extract its eigenvalues.

An efficient form for the Arnoldi iteration is:

        Let b be an arbitrary initial vector;
        Set q(1) = b / ||b||;
        for n = 1, ...
          v = A * q(n)
          for j = 1 to n
            h(j,n) = conjugate ( q(j) ) * v
            v = v - q(j) * h(j,n)
            h(n+1,n) = ||v||
            q(n+1) = v / h(n+1,n)
          end for
        end for
      

Back to TABLE OF CONTENTS.

Band Matrix

A band matrix is a matrix whose entries are all zero except for the diagonal and a few of the immediately adjacent diagonals.

If the band matrix is large enough, then many significant efficiencies can be achieved in storage and matrix operations. Because so many elements are zero, the band matrix can be stored more compactly using band matrix storage. And because so many elements are zero, many algorithms can be speeded up to execute more quickly, including matrix multiplication and Gauss elimination.

Special cases of band matrices include those which are also symmetric or positive definite.

Band matrices with very tight bandwidths are of interest, and include matrices which are diagonal, bidiagonal, tridiagonal, and in some cases pentadiagonal.

Here is an example of a band matrix:

        11 12  0  0  0  0
        21 22 23  0  0  0
        31 32 33 34  0  0
         0 42 43 44 45  0
         0  0 53 54 55 56
         0  0  0 64 65 66
      
This matrix has an upper bandwidth of 1, and lower bandwidth of 2, and an overall bandwidth of 4.

In some cases, a band matrix can be LU-factored without pivoting. This is always true if the matrix is positive definite. If a banded matrix A with lower and upper bandwidths ml and mu can be LU factored without pivoting, then its factorization can be written as

        A = L * U
      
The usual P factor is missing because there is no pivoting, but more is true. The factor L, which is unit lower triangular, is in fact also a band matrix, with lower bandwith ml, and U, the upper triangular factor, is also band matrix with upper bandwith mu.

If the matrix A is positive definite symmetric banded, with a half-bandwidth of m, then the banded LU factorization becomes:

        A = L * U
      
with L and U having m as their lower and upper bandwidths respectively. Moreover, if we "factor out" the non-unit diagonal of U by writing
        U = D * V
      
where V is unit upper triangular, we may write
        A = L * U = L * D * V = L * D2 * D2 * V = R' * R 
      
where D2 is the obvious "square root" of D (which must have positive entries because A is positive definite, remember!), and R is an upper triangular banded matrix with upper bandwidth m. Another way of looking at this is to say that the Cholesky factor of a banded positive definite symmetric matrix is itself banded with the same (half) bandwidth as the original matrix.

LAPACK and LINPACK include special routines for a variety of band matrices. These routines can compute the LU factorization, the determinant, inverse, or solution of a linear system.

LAPACK and EISPACK have routines for computing the eigenvalues of a symmetric banded matrix.

Back to TABLE OF CONTENTS.

Band Matrix Storage

Band matrix storage is a matrix storage format suitable for efficiently storing the nonzero entries of a band matrix.

Most band storage schemes are column oriented. The nonzero entries of the matrix "slide" downwards, while remaining in their original column. If the original matrix looked like this:

        11 12  0  0  0  0
        21 22 23  0  0  0
        31 32 33 34  0  0
         0 42 43 44 45  0
         0  0 53 54 55 56
         0  0  0 64 65 66
      
the matrix would be saved in column band matrix storage as:
 
         0 12 23 34  0  0
        11 22 33 44 45  0
        21 32 43 54 55 56
        31 42 53 64 65 66
      
Note that the zeroes in the above array are there just as padding. They don't correspond to any entries of the original array, and are simply necessary to make the array rectangular.

If the matrix is to be handled by a Gauss elimination routine that uses pivoting, then there is a possibility of fill in; that is, nonzero entries may need to be stored in places where zeroes had been. Band matrix storage can still be used, but we need to include in the compressed matrix some extra entries representing the diagonals along which the fill in entries may occur. It turns out that the number of extra diagonals required is simply the number of nonzero subdiagonals in the original matrix. For our example, this would mean the matrix would be stored as:

 
         0  0  0  0  0  0
         0  0  0  0  0  0
         0 12 23 34  0  0
        11 22 33 44 45  0
        21 32 43 54 55 56
        31 42 53 64 65 66
      

Back to TABLE OF CONTENTS.

Bandwidth

The bandwidth of a band matrix is, roughly speaking, the number of diagonals that contain nonzero entries.

More precisely, define ML, the lower bandwidth of a matrix A to be the maximum value of ( I - J ), and MU to be the maximum value of ( I - J ), for all nonzero matrix entries Ai,j. Then the bandwidth M is defined by:

M = ML + 1 + MU.

This definition always treats the (main) diagonal as nonzero, and is not misled by a matrix which has only two nonzero diagonals, which are actually widely separated. All the territory between the diagonals must be included when measuring bandwidth.

Back to TABLE OF CONTENTS.

Basis

A basis for a linear space X of dimension N is a set of N vectors, {v(i) | 1 <= i <= N } from which all the elements of X can be constructed by linear combinations.

Naturally, we require that each of the vectors v(i) be an element of the space X. Moreover, it is not enough that these vectors span the space; we also require that they be linearly independent, that is, there should be no redundant vectors in the set.

The columns of the identity matrix form a basis for the linear space of vectors of dimension N. A square matrix of order N is not defective exactly when its eigenvectors form a basis for the linear space of vectors of dimension N.

Given a particular basis, the representation of a vector x is the unique set of coefficients c(i) so that

x = Sum ( 1 <= I <= N ) c(i) * v(i)
The coefficients must be unique, otherwise you can prove that the basis is not linearly independent!

If the basis vectors are pairwise orthogonal, then the basis is called an orthogonal basis. If the basis vectors have unit length in the Euclidean norm, the basis is a normal basis. If both properties apply, it is an orthonormal basis. The columns of an orthogonal matrix are an orthonormal basis for the linear space of vectors of dimension N.

Back to TABLE OF CONTENTS.

Bidiagonal Matrix

A bidiagonal matrix has only two nonzero diagonals. The matrix is called upper bidiagonal if these are the main diagonal and the principal superdiagonal. The matrix is called lower bidiagonal if these are the main diagonal and the principal subdiagonal.

A simple example of an upper bidiagonal matrix is:

        1 2 0 0 0
        0 3 4 0 0
        0 0 5 6 0
        0 0 0 7 8
        0 0 0 0 9
      

The Jordan Canonical Form is an example of an upper bidiagonal matrix.

A bidiagonal matrix is automatically a:

with all the rights and privileges appertaining thereunto.

If a tridiagonal matrix can be LU-factored without pivoting, then the L factor is lower bidiagonal, and the U factor is upper bidiagonal.

Back to TABLE OF CONTENTS.

Basic Linear Algebra Subprograms (BLAS)

The BLAS or Basic Linear Algebra Subprograms, are a set of routines offering vector and matrix utilities. They are extensively used as part of LINPACK and LAPACK, to simplify algorithms, and to make them run more quickly.

The Level 1 BLAS provide basic vector operations such as the dot product, vector norm, and scaling. Level 2 BLAS provide operations involving a matrix and a vector, and Level 3 BLAS provide matrix-matrix operations.

There are also sets of sparse BLAS and parallel BLAS available.

Here are the Level 1 BLAS routines for single precision real vectors:

Back to TABLE OF CONTENTS.

Block Matrix

A block matrix is a matrix which is described as being built up of smaller matrices.

For example, a tridiagonal block matrix might look like this:

        2 4 | 3 9 | 0 0
        4 6 | 0 3 | 0 0
        ---------------
        1 0 | 2 4 | 3 9
        5 5 | 4 6 | 0 3
        ---------------
        0 0 | 1 0 | 2 4
        0 0 | 5 5 | 4 6
      
but for certain purposes, it might help us to see this matrix as "really" being a tridiagonal matrix, whose elements are themselves little matrices:
        a | b | 0
        ---------
        c | a | b
        ---------
        0 | c | a
      

An algorithm suitable for a tridiagonal matrix can often be extended, in a natural manner, to handle a block tridiagonal matrix. Similar extensions can be made in some cases for other types of block matrices. A block banded matrix can be factored by a variant of banded Gauss elimination, for instance.

Back to TABLE OF CONTENTS.

Border Banded Matrix

A border banded matrix is a 2 by 2 block matrix comprising a (large) leading block which is a square banded matrix, two dense rectangular side strips, and a (small) trailing block which is a square dense matrix.

For example, a "toy" border banded matrix might look like this:

         2 -1  0  0  0  0  0 | 1 2
        -1  2 -1  0  0  0  0 | 2 5
         0 -1  2 -1  0  0  0 | 7 8
         0  0 -1  2 -1  0  0 | 3 3
         0  0  0 -1  2 -1  0 | 4 2
         0  0  0  0 -1  2 -1 | 3 1
         0  0  0  0  0 -1  2 | 7 8
         -------------------------
         3  7  8  3  2  3  1 | 5 2
         1  2  4  7  9  2  4 | 3 6
      
which we can regard as being made up of the blocks:
A11 | A12
---------
A21 | A22
where, as we specified, A11 is a square banded matrix, A22 is a square dense matrix, and A21 and A12 are rectangular strips.

It is desirable to take advantage of the banded structure of A11. We can specify an algorithm for solving a linear system A * x = b that can be written in terms of operations involving the sub-matrices A11, A12, A21 and A22, which will achieve this goal, at the expense of a little extra work. One problem with this technique is that it will fail if certain combinations of the matrices A11 and A22 are singular, which can happen even when A is not singular.

The algorithm for solving A * x = b rewrites the system as:

A11 * X1 + A12 * X2 = B1
A21 * X1 + A22 * X2 = B2
The first equation can be solved for X1 in terms of X2:
X1 = - A11-1 * A12 * X2 + A11-1 * B1
allowing us to rewrite the second equation for X2:
( A22 - A21 * A11-1 * A12 ) X2 = B2 - A21 * A11-1 * B1
which can be solved as:
X2 = ( A22 - A21 * A11-1 * A12 )-1 * ( B2 - A21 * A11-1 * B1 )

The actual algorithm doesn't compute the inverse, of course, but rather factors the matrices A11 and A22 - A21 * A11-1 * A12.

Back to TABLE OF CONTENTS.

Cartesian Basis Vectors

The Cartesian basis vectors are simply the N columns of the identity matrix, regarded as individual column vectors.

These vectors form the standard basis for the set of vectors RN. The vector corresponding to the I-th column of the identity matrix is often symbolized by ei.

Thus, if we are working in a space with dimension N of 4, the basis vectors e1 through e4 would be:

        1     0     0    0
        0     1     0    0
        0     0     1    0
        0     0     0    1
      

Facts about the Cartesian basis vectors:

Back to TABLE OF CONTENTS.

The Cauchy-Schwarz Inequality

The Cauchy-Schwarz Inequality is a relationship between a vector inner product and a vector norm derived from that inner product. In particular, if the norm ||*|| is defined by an inner product (*,*) as follows:

|| x || = sqrt ( x, x ),
then the Cauchy-Schwarz inequality guarantees that for any vectors x and y it is the case that:
| ( x, y ) | <= || x || * || y ||.

Back to TABLE OF CONTENTS.

The Cayley-Hamilton Theorem

The Cayley-Hamilton Theorem guarantees that every (square) matrix satisfies its own characteristic equation.

For example, if A is the matrix:

        2 3
        1 4
      
then the characteristic equation is
lambda2 - 6 * lambda + 5 = 0.
which is not true for all values lambda, but just a few special values known as eigenvalues. The Cayley-Hamilton theorem guarantees that the matrix version of the characteristic equation, with A taking the place of lambda, is guaranteed to be true:
A2 - 6 * A + 5 * I = 0.

Back to TABLE OF CONTENTS.

CentroSymmetric Matrix

A centrosymmetric matrix is one which is symmetric about its center; that is,

Ai,j = Am+1-i,n+1-j

Example:

         1 10  8 11  5
        13  2  9  4 12
         6  7  3  7  6
        12  4  9  2 13
         5 11  8 10  1
      

A centrosymmetric matrix A satisfies the following equation involving the Exchange matrix J:

J*A*J = A
.

Back to TABLE OF CONTENTS.

Characteristic Equation

The characteristic equation of a (square) matrix A is the polynomial equation:

det ( A - lambda * I ) = 0
where lambda is an unknown scalar value.

The left hand side of the equation is known as the characteristic polynomial of the matrix. If A is of order N, then there are N roots of the characteristic equation, possibly repeated, and possibly complex.

For example, if A is the matrix:

        2 3
        1 4
      
then the characteristic equation is
            (    2 - lambda     3           )
        det (    1              4 - lambda  )  = 0
      
or
lambda2 - 6 * lambda + 5 = 0.
This equation has roots lambda = 1 or 5.

Values of the scalar lambda which satisfy the characteristic equation are known as eigenvalues of the matrix.

Some facts about the characteristic equation of A:

The Cayley-Hamilton Theorem guarantees that the matrix itself also satisfies the matrix version of its characteristic equation.

Back to TABLE OF CONTENTS.

Cholesky Factorization

The Cholesky factorization of a positive semidefinite symmetric matrix A has the form:

A = L * L'
where L is a lower triangular matrix, or equivalently:
A = R' * R
where R is an upper triangular matrix.

If a matrix is symmetric, then it is possible to determine whether or not the matrix is positive definite simply by trying to compute its Cholesky factorization: if the matrix has a zero eigenvalue, then it is positive semidefinite, and the algorithm should theoretically spot this by computing a zero diagonal element; if the matrix actually has a negative eigenvalue, then at a particular point in the algorithm, the square root of a negative number will be computed.

Software to compute the Cholesky factorization often saves space by using symmetric matrix storage, and overwriting the original matrix A by its Cholesky factor L.

As long as the matrix A is positive definite, the Cholesky factorization can be computed from an LDL factorization, or, if no pivoting was done, from an LU factorization.

The Cholesky factorization can be used to compute the square root of the matrix.

The LINPACK routines SCHDC, SCHUD, SCHDD, SCHEX, SPOCO, SPOFA, SPODI, and SPOSL compute and use the Cholesky factorization.

Back to TABLE OF CONTENTS.

Circulant Matrix

A circulant matrix is a matrix whose first row of values is repeated in each successive row, shifted one position to the right, with the end value "wrapping around".

Here is a square circulant matrix:

        1 2 3 4
        4 1 2 3
        3 4 1 2
        2 3 4 1
      
a "wide" rectangular circulant matrix:
        1 2 3 4 5
        5 1 2 3 4
        4 5 1 2 3
      
a "tall" rectangular circulant matrix:
        1 2 3
        5 1 2 
        4 5 1
        3 4 5
        2 3 4
      

Simple facts about a (rectangular) circulant matrix A:

Simple facts about a square circulant matrix A:

Compare the concept of an anticirculant matrix.

Back to TABLE OF CONTENTS.

Cofactor Matrix

The cofactor matrix of a square matrix A is generally used to define the adjoint matrix, or to represent the determinant.

For a given matrix A, the cofactor matrix is the transpose of the adjoint matrix:

cofactor ( A ) = ( adjoint ( A ) )'

The determinant det(A) can be represented as the product of each of the entries of any given row or column times their corresponding cofactor entries. In particular, consider the first row:

det(A) = A(1,1) * cofactor(A)(1,1) + A(1,2) * cofactor(A)(1,2) + ... + A(1,N) * cofactor(A)(1,N)

The formula for the (I,J) entry of the cofactor matrix of A is:

cofactor(A)(I,J) = (-1)(I+J) * det ( M(A,I,J) )
where M(A,I,J) is the minor matrix of A, constructed by deleting row I and column J.

Back to TABLE OF CONTENTS.

Column Echelon Form

Column echelon form is a special matrix structure which is usually arrived at by Gauss elimination.

Any matrix can be transformed into this form, using a series of elementary column operations. Once the form is computed, it is easy to compute the determinant, inverse, the solution of linear systems (even for underdetermined or overdetermined systems), the rank, and solutions to linear programming problems.

A matrix (whether square or rectangular) is in column echelon form if:

A matrix is in column reduced echelon form if it is in column echelon form, and it is also true that:

Column echelon form is primarily of use for teaching, and analysis of small problems, using exact arithmetic. It is of little interest numerically, because very slight errors in numeric representation or arithmetic can result in completely erroneous results.

Back to TABLE OF CONTENTS.

Commuting Matrices

Two square matrices, A and B, are said to commute if

A * B = B * A

Facts about commuting matrices:

Back to TABLE OF CONTENTS.

Companion Matrix

The companion matrix for a monic polynomial P(X) of degree N is a matrix of order N whose characteristic polynomial is P(X).

If the polynomial P(X) is represented as:

P(X) = XN + C(1) * X(N-1) + ... + C(N-1) * X + C(N).
then the companion matrix for this polynomial has the form:
        0 0 0 ... 0 0 -C(N)
        1 0 0 ... 0 0 -C(N-1)
        0 1 0 ... 0 0 -C(N-2)
        ...................
        0 0 0 ... 1 0 -C(2)
        0 0 0 ... 0 1 -C(1)
      

Note that the determinant of the companion matrix is either -C(N) or +C(N), so it is never zero unless the polynomial itself is degenerate (has a zero leading coefficient).

The characteristic polynomial of a companion matrix is, in fact, the polynomial that was used to define the matrix originally. Thus it is always possible to construct a matrix with any desired set of eigenvalues, by constructing the corresponding characteristic polynomial, and then the companion matrix.

The companion matrix can also be used to perform a decomposition of a matrix A. If x is a vector, and K the Krylov matrix

K = Krylov ( A, x, n )
whose columns are the successive products x, A*x, A2*x, and so on, and if K is nonsingular, then
A = K * C * K-1
where the matrix C is the companion matrix of A.

(There are several equivalent forms of the companion matrix, with the coefficients running along the top, the bottom, or the first column of the matrix.)

Back to TABLE OF CONTENTS.

Compatible Norms

A matrix norm and a vector norm are compatible if it is true, for all vectors x and matrices A that

||A*x|| <= ||A|| * ||x||
In some texts, the word consistent is used in this sense, instead of compatible.

In particular, if you have not verified that a pair of norms are compatible, then the above inequality is not guaranteed to hold. For any vector norm, it is possible to define at least one compatible matrix norm, namely, the matrix norm defined by:

||A|| = supremum ||A*x|| / ||x||
where the supremum (roughly, the "maximum") is taken over all nonzero vectors x. If a matrix norm can be derived from a vector norm in this way, it is termed a vector-bound matrix norm. Such a relationship is stronger than is required by compatibility.

If a matrix norm is compatible with some vector norm, then it is also true that

||A*B|| <= ||A|| * ||B||
where both A and B are matrices.

Back to TABLE OF CONTENTS.

Complex Number Representation

Complex numbers have the form a+bi, where i is a special quantity with the property that i2=-1.

It is possible to devise real matrices that behave like complex numbers. Let the value "1" be represented by the identity matrix of order 2, and the value "i" be represented by

        0  1
       -1  0
      
Then it is easy to show that these matrices obey the rules of complex numbers. In particular, "i" * "i" = - "1". In general, the complex number a+bi is represented by
        a  b
       -b  a
      
and multiplication and inversion have the correct properties.

Back to TABLE OF CONTENTS.

Condition Number

The condition number of the coefficient matrix A of a linear system is a (nonnegative) number used to estimate the amount by which small errors in the right hand side b, or in A itself, can change the solution x.

This analysis ignores arithmetic roundoff, which is hard to analyze, and focusses on easily measurable quantities known beforehand, and how they will amplify or diminish the roundoff errors.

Small values of the condition number suggest that the algorithm will not be sensitive to errors, but large values indicate that small data or arithmetic errors may explode into enormous errors in the answer.

The condition number is defined in terms of a particular matrix norm. Many different matrix norms may be chosen, and the actual value of the condition number will vary depending on the norm chosen. However, the general rule that large condition numbers indicate sensitivity will hold true no matter what norm is chosen.

The condition number for a matrix A is usually defined as

condition ( A ) = || A || * || A-1 ||.
If A is not invertible, the condition number is infinite.

Simple facts about the condition number:

LINPACK routines such as SGECO return RCOND, an estimate of the reciprocal of the condition number in the L1 matrix norm.

Turing's M condition number, M(A), for a matrix of order N, is defined as

M(A) = N * max | Ai,j | * max | A-1i,j |.

Turing's N condition number, N(A) is

N(A) = Frob ( A ) * Frob ( A-1 ) / N
where Frob(A) is the Frobenius matrix norm.

The Von Neumann and Goldstine P condition number is

P(A) = | lambda_Max / lambda_Min |
where lambda_Max and lambda_Min are the eigenvalues of largest and smallest magnitude, which is equivalent to using the spectral radius of A and A-1.

There is also a condition number defined for the eigenvalue problem, which attempts to estimate the amount of error to be expected when finding the eigenvalues of a matrix A.

Back to TABLE OF CONTENTS.

Congruent Matrix

Congruent matrices A and B are related by a nonsingular matrix P such that

A = P' * B * P.

Congruent matrices have the same inertia.

Congruence is of little interest by itself, but the case where P is also an orthogonal matrix is much more important.

Back to TABLE OF CONTENTS.

Conjugate Gradient Method

The conjugate gradient method is designed to solve linear systems

A * x = b
when the matrix A is symmetric, and positive definite.

The classical method is not an iterative method, but rather a direct method which produces an approximation to the solution after N steps.

Because of numerical inaccuracies and instabilities, many implementations of the method repeat the computation several times, until the residual error is deemed small enough.

On the other hand, if N is very large, and a suitable preconditioner is used, it is possible to treat the conjugate gradient method as "essentially" an iterative method, in the sense that it may be the case that a good approximate solution can be found by stopping the algorithm early. The matrix is assumed to be symmetric positive definite; if it is also large and sparse, a suitable preconditioner may be the Incomplete Cholesky Factorization. In this case, the algorithm is sometimes referred to as ICCG(0): that is, the Incomplete Cholesky - Conjugate Gradient algorithm.

The method is ideally suited for use with large sparse systems, because the matrix A is only accessed to compute a single matrix-vector product on each step. This involves no fill in or overwriting of the data structure that describes A. On the other hand, if the matrix A is dense, the conjugate gradient method costs roughly 3 times the number of operations for direct Gauss elimination.

The conjugate gradient method can be considered as a minimization of the functional f(x), defined by

f(x) = x' * ( 0.5 * A * x - b )
which achieves its minimum value when x solves the linear system.

Here are the formulas for the basic conjugate gradient method. Brackets indicate the value of an iterative quantity. X[0] is the initial value of the vector X, X[1] the value after one iteration, and so on.

        X[0] = 0
        For K = 1 to N
      
Compute the residual error:
        R[K-1] = B - A * X[K-1]
      
Compute the direction vector:
        If K = 1 then
          P[K] = R[0]
        else
          BETA = - P'[K-1] * A * R[K-1] 
               / ( P'[K-1] * A * P[K-1] )
          P[K] = R[K-1] + BETA * P[K-1]
        end if
      
Compute the location of the next iterate:
        ALPHA = ( R'[K-1] ) * R[K-1] / ( P'[K] * A * P[K] )
        X[K] = X[K-1] + ALPHA * P[K]
      end for
      

Conjugate gradient algorithms are available in IMSL, ITPACK and NSPCG.

Back to TABLE OF CONTENTS.

Conjugate Matrix

The conjugate matrix of a complex matrix A, denoted by A* or conjugate ( A ), is the matrix obtained by replacing each entry of A by its complex conjugate.

(In this document, the form conjugate ( A ) is preferred, because the A* is easily confused with multiplication.

The complex conjugate transpose, sometimes called the Hermitian or tranjugate of A, is derived from A by complex conjugation, followed by transposition, and is denoted by AH.

Back to TABLE OF CONTENTS.

Conjugate of a Complex Number

The conjugate of a complex number z = a + b * i is the complex number

conjugate ( z ) = a - b * i.

The conjugate is frequently represented by placing a bar over the quantity, or occasionally a star after it, as in z*.

The complex conjugate can be used in a formula for the norm or magnitude of a complex number, which must always be a real nonnegative value:

norm ( z ) = sqrt ( z * conjugate ( z ) ) = sqrt ( a2 + b2 ).

For complex vectors, an inner product with the correct properties may be defined as:

V dot W = ( V, W ) = sum ( 1 <= I <= N ) conjugate ( V(I) ) * W(I).
This inner product is computed in the BLAS function CDOTC, for example, and yields another relationship with the Euclidean vector norm:
|| V || = sqrt ( V dot V )

Back to TABLE OF CONTENTS.

Conjunctive Matrix

Two (complex) matrices A and B are said to be conjunctive if there is some nonsingular matrix P so that

A = ( conjugate ( P ) )' * B * P,

This is the extension to complex matrices of the concept of a congruent real matrix.

Back to TABLE OF CONTENTS.

Consistent System

A consistent linear system is an M by N linear system A * x = b, with A and b given, for which there is at least one solution vector x.

A simple example of an inconsistent system is:

        1 0 1        1
        0 2 3 * x =  2
        1 2 4        4
      
The last row of A is the sum of the previous two rows, but the last entry of b is not the sum of the previous two entries, so there can be no solution.

By contrast, the system

        1  2  3         14
        4  5  6  * x  = 34
      
is consistent, having as one solution (of many) the vector (1,2,3), and the system
        1  0          5
        1  2          5
        2  4  * x =  10
        3  6         15
        4  8         20
      
is consistent, having the unique solution ( 5, 0 )

Facts about a consistent linear system:

If a linear system is not consistent, then there is no solution x. But in such a case, it may be of interest to determine an approximate solution, which satisfies some condition, such as minimizing some norm of the residual. One technique for doing this involves the linear least squares method..

Back to TABLE OF CONTENTS.

Convergent Matrix

A convergent matrix A is a square matrix for which the limit as n goes to infinity of An is zero.

A matrix is convergent if and only if the spectral radius rho(A) satisfies

rho(A) < 1

A semiconvergent matrix A is a square matrix A for which the limit as n goes to infinity of An exists. If a matrix is semiconvergent, then either it is convergent, (rho(A) < 1) or else rho(A) = 1. In the second case, it must be further true that the only eigenvalue of norm 1 is 1, and that this eigenvalue is semisimple.

The simplest example of a semiconvergent matrix is the identity matrix.

Back to TABLE OF CONTENTS.

Correlation Matrix

A correlation matrix is a square positive semidefinite symmetric matrix with unit diagonal entries, and off-diagonal entries whose magnitude is no greater than 1.

In statistical applications, entry Ai,j represents the strength of the correlation between variables I and J. The strongest correlation is +1; a correlation of 0 means the variables are apparently completely independent. If a correlation is negative, it means that variable I is positively correlated to the negative of the value of variable J.

The entries of the correlation matrix can be represented as

Ai,j = X(I) dot X(J) / ( ||X(I)|| ||X(J)|| )
where the norm used is derived from the dot product, and the dot product takes the form appropriate for the kind of data being handled, such as finite vectors, or continuous functions.

Back to TABLE OF CONTENTS.

Cross Product

The cross product of two vectors u and v, denoted u x v, is a vector w which is perpendicular to u and v, pointing in the direction so that (u,v,w) forms a right handed coordinate system, and whose length is equal to the area of the parallelogram two of whose sides are u and v.

Algebraically,

w(1) = u(2) * v(3) - u(3) * v(2)
w(2) = u(3) * v(1) - u(1) * v(3)
w(3) = u(1) * v(2) - u(2) * v(1)

If the unit vectors in the coordinate directions are denoted by i, j and k, then the cross product vector can also be regarded as the (vector) value of the following "determinant":

                        |   i    j    k  |
        w = u x v = det | u(1) u(2) u(3) |
                        | v(1) v(2) v(3) |
      

Back to TABLE OF CONTENTS.

Cyclic Reduction

Cyclic reduction is a method for solving a linear system A*x=b in the special case where A is a tridiagonal matrix.

On a parallel computer, this method solves the system in LOG(N) "steps" where N is the order of A. A standard Gauss elimination method for a tridiagonal system would require roughly N "steps" instead.

A tridiagonal system has some very special properties that will allow us to carry this operation out. Consider this system of 7 equations:

        A11 x1 + A12 x2                                              = y1
        A21 x1 + A22 x2 + A23 x3                                     = y2
                 A32 x2 + A33 x3 + A34 x4                            = y3
                          A43 x3 + A44 x4 + A43 x5                   = y4
                                   A54 x4 + A55 x5 + A56 x6          = y5
                                            A65 x5 + A66 x6 + A67 x7 = y6
                                                     A76 x6 + A77 x7 = y7
      

The first equation can be used to eliminate the coefficient A21 in the second equation, and the third equation to eliminate the coefficient A23 in the second equation. This knocks out variables x1 and x3 in the second equation, but adds x4 into that equation.

By the same method, x3 and x5 can be eliminated from the equation for x4, and so on. By eliminating the odd variables from the even equations, a smaller tridiagonal system system is derived, with half the equations and variables.

If elimination is applied to this set, the number of equations is again reduced by half; this reduction may be repeated until a single equation in one variable is reached. Backsubstitution then produces the values of all the variables.

The reason this method might have an advantage over Gauss elimination is that, at each step of the elimination phase, the parts of the step are independent. If many computer processors are available, then each can be working on a separate portion of the elimination. If the number of processors is large enough, the system can really be solved in LOG(N) time.

Cyclic reduction routines are available in the NCAR software library, the SLATEC library, and the Cray SCILIB library.

Back to TABLE OF CONTENTS.

Cyclic Tridiagonal Matrix

A cyclic tridiagonal matrix is a generalization of a tridiagonal matrix which includes an extra last entry in the first row, and an extra first entry in the last row.

An example of a cyclic tridiagonal matrix:

        -2  1  0  0  1
         1 -2  1  0  0
         0  1 -2  1  0
         0  0  1 -2  1
         1  0  0  1 -2
      

A cyclic tridiagonal matrix is not a tridiagonal matrix. If the matrix is constant along the three generalized diagonals, a cyclic tridiagonal matrix is a circulant matrix. A cyclic tridiagonal matrix can arise in situations where a periodic boundary condition is applied.

It is very disappointing that a cyclic tridiagonal matrix is not a tridiagonal matrix, since there are so many good methods for solving tridiagonal linear systems. One way to solve a cyclic tridiagonal system is to use the Sherman Morrison Formula and view the matrix as a rank one perturbation of a tridiagonal matrix. Another approach is to view it as a border banded matrix.

A cyclic tridiagonal matrix may also be called a periodic tridiagonal matrix.

Back to TABLE OF CONTENTS.

Defective Matrix

A defective matrix is a (square) matrix that does not have a full set of N linearly independent eigenvectors.

For every eigenvalue, there is always at least one eigenvector, and eigenvectors corresponding to distinct eigenvalues are linearly independent. If the N eigenvalues of a matrix are distinct, then it surely has N linearly independent eigenvectors, and so cannot be defective.

Conversely, if a matrix is defective, then it must have at least one repeated eigenvalue, that is, an eigenvalue of algebraic multiplicity greater than 1. A matrix is defective if and only if its Jordan Canonical Form has at least one nonzero entry on the superdiagonal.

Thus, a simple example of a defective matrix is:

        1 1
        0 1
      
which has the single eigenvalue of 1, with algebraic multiplicity 2, but geometric multiplicity 1. The only eigenvector is ( 0, 1 ).

If a matrix is not defective, then its eigenvectors form a basis for the entire linear space. In other words, any vector y can be written as

y = X * c
where X is the array of eigenvectors of A.

If a matrix A is not defective, then it is similar to its diagonal eigenvalue matrix:

A = X * LAMBDA * X-1
and the similarity transformation matrix X is actually the eigenvector matrix.

This in turn allows us to make interesting statements about the inverse, tranpose, and powers of A. For instance, we see that

A2 = ( X * LAMBDA * X-1 ) * ( X * LAMBDA * X-1 )
= X * LAMBDA2 * X-1
leading us to the statement that for a nondefective matrix, the square has the same eigenvectors, and the square of the eigenvalues.

Back to TABLE OF CONTENTS.

Deflation

Deflation is a technique for "removing" a known eigenvalue from a matrix, in order to facilitate the determination of other eigenvalues.

For example, the power method is able to estimate the eigenvalue of largest modulus of a matrix A. Once this is computed, it might be desired to find the next largest eigenvalue. Deflation can be used to essentially create a new matrix, A', which has the same eigenvalues as A, except that the largest eigenvalue has been dropped (and the order of A' reduced by 1) or the largest eigenvalue is replaced by 0. In either case, the power method applied to A' will produce the next largest eigenvalue of A.

To eliminate a known eigenvalue, lambda, it is necessary to know its eigenvector x, which we will assume has been scaled to have unit Euclidean norm. By the properties of eigenvectors, we know that

A * x = lambda * x.
Now define the matrix A^ so that:
A^ = A - lambda * x * x'.
Now x is an eigenvector of A^ with eigenvalue 0, because:
A^ * x = ( A - lambda * x * x' ) * x
= A * x - lambda * x * x' * x
= lambda * x - lambda * x
= 0

If the power method is being employed, then the new iteration should try to "factor out" any component of the eigenvector x; otherwise, small errors in the computation of the first eigenvalue and eigenvector will interfere with the next results.

Theoretically, this process may be repeated as often as desired, eliminating each eigenvalue as it is discovered. Practically, however, accumulated errors in the eigenvalues and eigenvectors make the computation more and more unreliable with each step of deflation. Thus, if more than a few eigenvalues are desired, it is more appropriate to use a standard technique.

Back to TABLE OF CONTENTS.

Derogatory Matrix

A derogatory matrix is a matrix whose minimal polynomial is of lower degree than its characteristic polynomial

For this to happen, of course, the matrix must have at least one eigenvector with algebraic multiplicity greater than one.

Surprisingly, the identity matrix is derogatory, (ignoring the case N=1). A matrix can be guaranteed to be nonderogatory if the geometric multiplicity of every eigenvalue is 1.

Perhaps the only reason that the term is worth knowing is this fact: every nonderogatory matrix is similar to the companion matrix of its characteristic polynomial.

Back to TABLE OF CONTENTS.

Determinant of a Matrix

The determinant of a square matrix is a scalar value whose most common usage is to indicate whether a matrix is singular or not.

If a matrix is singular, then it doesn't have an inverse and linear systems involving this matrix cannot be reliably solved.

In numerical work, the determinant is not really a reliable indicator of singularity, and other data, such as the relative size of the matrix elements encountered during pivoting, are preferred when checking for singularity.

The determinant also occurs in the definition of the eigenvalue problem.

An explicit formula for the determinant of a matrix A is:

det ( A ) = sum [ over all P ] sign(P) * A(1,P(1)) * A(2,P(2) * ... * A(N,P(N)).
where the sum ranges over all possible permutations P of the numbers 1 through N, and sign(P) is +1 for an even permutation, and -1 for an odd permutation. (Any permutation may be accomplished by a sequence of switching pairs of objects. The permutation is called even or odd, depending on whether the number of switches is even or odd).

A numerical method for finding the determinant comes as a byproduct of the LU factorization used in Gaussian elimination. Typically, this factorization has the form

A = P * L * U,
and the value of the determinant is simply
det ( A ) = det ( P ) * product ( 1 <= I <= N ) U(I,I).
where det ( P ) is +1 or -1, again determined by the sign of the permutation.

Simple facts about the determinant:

A single elementary row operation has the following effect on the determinant:

For small matrices, the exact determinant is simple to compute by hand. The determinant of a 2 by 2 matrix

        a  b
        c  d
      
is a*d-b*c, while the determinant of a 3 by 3 matrix:
        a  b  c
        e  f  g
        h  i  j
      
is a * (f*j-g*i) - b * (e*j-g*h) + c * (e*i-f*h).

If absolutely necessary, the determinant of a matrix of order N can be computed recursively in terms of determinants of minor matrices. Let M(A,I,J) stand for the (I,J) minor matrix of A. Then the determinant of A is

det ( A ) = sum ( 1 <= J <= N ) (-1)(J+1) * Ai,j * det ( M(A,I,J) ).
Of course, now we need to compute the determinants of the N minor matrices, but the order of these matrices has been reduced by 1. Theoretically, we can represent the determinant of any of these matrices of order N-1 by a similar sum involving minor matrices of order N-2, and this process can be repeated until we reach matrices of order 1 or 2, whose determinants are easy to compute. In practice, this method is never used except in simple classroom exercises.

There is a geometric interpretation of the determinant. If the rows or columns of A are regarded as vectors in N dimensional space, then the determinant is the volume of a the parallelepiped, or "slanted cube" whose one corner is defined by these vectors.

LAPACK and LINPACK provide routines for computing the determinant of a matrix, after the matrix has been decomposed into LU factors.

Back to TABLE OF CONTENTS.

Diagonal Dominance

A matrix is (column) diagonally dominant if, for every column, the sum of the absolute values of the offdiagonal elements is never greater than the absolute value of the diagonal element.

The matrix is strictly diagonally dominant if the offdiagonal sum is always strictly less than the absolute value of the diagonal element.

The same definitions can be used to consider rows instead of columns. The terms column diagonally dominant and row diagonally dominant may be used, if necessary, to specify which case is being considered.

A strictly diagonally dominant matrix cannot be singular, by Gershgorin's Theorem.

A diagonally dominant matrix which is also irreducible cannot be singular.

Here is a diagonally dominant matrix which is not strictly diagonally dominant:

        -2  1  0  0  0
         1 -2  1  0  0
         0  1 -2  1  0
         0  0  1 -2  1
         0  0  0  1 -2
      

For a linear system A * x = b, if the matrix A is strictly diagonally dominant, then both Jacobi iteration and Gauss Seidel iteration are guaranteed to converge to the solution.

Back to TABLE OF CONTENTS.

Diagonal Matrix

A diagonal matrix is one whose only nonzero entries are along the main diagonal. For example:

        3 0 0
        0 4 0
        0 0 7
      

Simple facts about a diagonal matrix A:

Back to TABLE OF CONTENTS.

Diagonalizable Matrix

A diagonalizable matrix is any (square) matrix A which is similar to a diagonal matrix:

A = P * D * P-1.

This concept is important in the study of eigenvectors To see the relationship, post-multiply the equation by P:

A * P = P * D.
Looking at the columns of P as eigenvectors, and the diagonal entries of D as eigenvalues, this shows that a matrix is diagonalizable exactly when it has N linearly independent eigenvectors.

In certain cases, not only is a matrix diagonalizable, but the matrix P has a special form. The most interesting case is that of any (real) symmetric matrix A; not only can such a matrix be diagonalized, but the similarity matrix is orthogonal:

A = Q * D * Q-1 = Q * D * Q',
This fact can be interpreted to show that not only does every symmetric matrix have a complete set of eigenvectors, but the eigenvectors and eigenvalues are real, and the eigenvectors are pairwise orthogonal.

Similarly, a complex matrix that is a Hermitian has a complete set of eigenvectors, and the eigenvalues are real. (The eigenvectors in this case will not generally be real).

Simple facts:

Back to TABLE OF CONTENTS.

Downshift Matrix

The downshift matrix A circularly shifts all vector entries or matrix rows down 1 position.

Example:

        0 0 0 1
        1 0 0 0
        0 1 0 0
        0 0 1 0
      

Facts about the downshift matrix A:

Back to TABLE OF CONTENTS.

Eigenvalues

Eigenvalues are special values associated with a (square) matrix, which can be used to analyze its behavior in multiplying any vector.

The formal definition of an eigenvalue of a matrix A is that it is any value lambda which is a root of the characteristic equation of the matrix,

det ( A - lambda * I ) = 0.

lambda is an eigenvalue of A if and only if there is a nonzero vector x, known as an eigenvector (or sometimes a "right" eigenvector), with the property that

A * x = lambda * x.
Note that there must also be a "left" eigenvector y, with the property
y * A = A' * y = lambda * y.

The characteristic equation has exactly N roots, so a matrix has N eigenvalues. An important consideration is whether any eigenvalue is a repeated root, which determines how hard the eigenvector computation will be.

If a matrix has the maximum possible number of linearly independent eigenvectors (namely N, the order of the matrix), then the eigenvalues and eigenvectors can be used to diagonalize the matrix. This only happens when the matrix is normal.

Simple facts about eigenvalues of A:

Simple algorithms for computing eigenvalues include the power method and the inverse power method. The QR method is a more powerful method that can handle complex and multiple eigenvalues.

LAPACK and EISPACK include algorithms for computing the eigenvalues and eigenvectors of a variety of types of matrix, as well as methods that can be applied to more general eigenvalue problems.

Back to TABLE OF CONTENTS.

Eigenvectors

A nonzero vector x is an eigenvector of the square matrix A if

A * x = lambda * x
for some scalar value lambda, called the associated eigenvalue.

Sometimes this eigenvector is more particularly described as a right eigenvector, so that we may also consider left eigenvectors, that is, vectors y for which it is true that

y * A = A' * y = mu * y
for some scalar mu.

For every eigenvalue of a matrix, there is at least one eigenvector. Every nonzero multiple of this eigenvector is also an eigenvector, but in an uninteresting way. If, and only if, an eigenvalue is a repeated root, then there may be more than one linearly independent eigenvector associated with that eigenvalue. In particular, if an eigenvalue is repeated 3 times, then there will be 1, 2 or 3 linearly independent eigenvectors corresponding to that eigenvalue.

Facts about eigenvectors:

Back to TABLE OF CONTENTS.

EISPACK

EISPACK is a package of routines for handling the standard and generalized eigenvalue problems.

The beginning user who is not interested in trying to learn the details of EISPACK and simply wants the answer to an eigenvalue problem quickly should call one of the main driver routines. Each of these is tailored to handle a given problem completely with a single subroutine call. For more advanced work, it may be worth investigating some of the underlying routines.

Driver routines to solve A*x=lambda*x include:

For the generalized eigenvalue problem:

Back to TABLE OF CONTENTS.

EISPACK Matrix Norm

The EISPACK matrix norm is used in the EISPACK eigenvalue package.

The definition of the norm for an M by N matrix is:

||A|| = sum ( I = 1 to M, J = 1 to N ) | Ai,j |

It's a simple exercise to verify that this quantity satisifes the requirements for a matrix norm.

This norm is easy to calculate, and was used in EISPACK in order to have a standard against which to compare the size of matrix elements that were being driven to zero. I haven't seen it used anywhere else in practice.

Back to TABLE OF CONTENTS.

Elementary Column Operations

Elementary column operations are a simple set of matrix operations that can be used to carry out Gauss elimination, Gauss Jordan elimination, or the reduction of a matrix to column echelon form.

Restricting the operations to a simple set makes it easy to:

The three elementary column operations include:

Each of these operations may be represented by an elementary matrix, and the transformation of the original matrix A to the reduced matrix B can be expressed as postmultiplication by a concatenation of elementary matrices:

B = A * E(1) * E(2) * ... * E(k)
which may be abbreviated as:
B = A * C
Since C will be guaranteed to be invertible, we also know that,
B * C-1 = A
which yields a factorization of A.

Back to TABLE OF CONTENTS.

Elementary Matrix

An elementary matrix E is one which, when pre-multiplying another matrix A, produces a product matrix E * A which has exactly one of the following properties:

The matrix E which interchanges rows R1 and R2 of matrix A has the form E(I,J)=:

The inverse of this matrix is simply its transpose

The matrix E which multiplies row R1 of A by the constant s has the form E(I,J)=:

The inverse of this matrix is constructed by negating the value of s.

The matrix E which adds s * row R2 to row R1 of A has the form E(I,J) =:

The inverse of this matrix is constructed in the same way, using 1/s.

If a matrix F can be represented as the product of elementary matrices,

F = E1 * E2 * ... * EM,
then its inverse is:
F-1 = EM-1 * EM-1-1 * ... * E1-1.

An elementary similarity transformation uses a matrix F which is the product of elementary matrices, and transforms the matrix A into the similar matrix B by the formula

B = F-1 * A * F.

Back to TABLE OF CONTENTS.

Elementary Row Operations

Elementary row operations are a simple set of matrix operations that can be used to carry out Gauss elimination, Gauss Jordan elimination, or the reduction of a matrix to row echelon form.

Restricting the operations to a simple set makes it easy to:

The three elementary row operations include:

Each of these operations may be represented by an elementary matrix, and the transformation of the original matrix A to the reduced matrix B can be expressed as premultiplication by a concatenation of elementary matrices:

B = E(k) * E(k-1) * ... * E(2) * E(1) * A
which may be abbreviated as:
B = C * A.
Since C will be guaranteed to be invertible, we also know that,
C-1 * B = A
which yields a factorization of A.

Back to TABLE OF CONTENTS.

Ellipsoids

An ellipsoid is an N dimensional generalization of an ellipse. The formula for an ellipsoid may be written as:

sum ( I = 1 to N, J = 1 to N ) Ai,j * Xi * Xj = 1.
where A is a positive definite symmetric matrix.

A principal axis of an ellipsoid is any N dimensional point X on the ellipsoid such that the vector from the origin to X is normal to the ellipsoid.

In the general case, there are exactly N principal axes (plus their negatives). In degenerate cases, there may be an entire plane of vectors that satisfy the requirement, but it is always possible to choose a set of N principal axes which are linearly independent.

Moreover, in the general case, the principal axes are pairwise orthogonal vectors, and in the degenerate case, may be chosen pairwise orthogonal.

Moreover, it is always true that the principal axes are eigenvectors of the matrix A of ellipsoid coefficients. The length of the principal axis vector associated with an eigenvalue lambda(I) is 1 / Sqrt ( lambda(I) ).

These facts have a strong relationship to the formulation of the conjugate gradient method.

Back to TABLE OF CONTENTS.

Equilibration

Equilibration is the technique of balancing the rows or columns of a matrix by rescaling them.

Consider, for instance, the fact that the following two equations are equivalent:

0.0001 * x + 0.0001 * y = 0.0001
and
1000 * x + 1000 * y = 1000
However, the large coefficients in the second equation will bias a Gauss elimination routine to choose that equation as its pivot. Actually, it's more important in this case that the chosen row be as "linearly independent as possible" from the other rows, and this is more likely to occur if we ensure that all the rows start out with an equal norm. This can be done very simply, by finding the element of maximum absolute value in each row and dividing that row (and its right hand side) by that value. Such a technique is called row equilibration. It is not necessary that the rows have precisely the same norm; it is desirable that the norms of the rows be maintained within some controlled range.

Equilibration is useful in many areas of linear algebra, including eigenvalue calculations. In some cases, column equilibration is preferred, and in other cases, the norms of both the rows and columns are to be controlled.

Back to TABLE OF CONTENTS.

Equivalent Matrix

Matrices A and B are said to be equivalent if there are nonsingular matrices P and Q so that

A = P * B * Q