Matrix Factorization Methods

5.8. Matrix Factorization Methods#

import numpy as np

We will cover three matrix factorization techniques used to solve equations of the form

(5.135)#\[\begin{equation} A\mathbf{x} = \mathbf{b}. \end{equation}\]

The three methods are Cholesky decomposition, QR factorization, and singular value decomposition (SVD).

Before we cover each factorization method, we will need to cover some aspects of matrices.

Symmetric Matrix: A matrix \(A\) is symmetric if it equals its transpose, \(A^T = A\). Here is a Python example.

# Define a symmetric matrix
A = np.array([[4, 1, 2],
              [1, 3, 5],
              [2, 5, 6]])

# check is symmetric by inspection
print(A.T)

[[4 1 2]
 [1 3 5]
 [2 5 6]]

# Check if the matrix is symmetric (A == A.T)
np.array_equal(A, A.T)

True

5.8.1. Positive Definite Matrix#

A positive definite matrix is a symmetric matrix that has all positive eigenvalues. Key properties include:

Symmetry: A positive definite matrix must be symmetric, i.e., \(A = A^T\).

Eigenvalues: All eigenvalues of a positive definite matrix are positive.

Cholesky Decomposition: A positive definite matrix always has a Cholesky decomposition. This means it can be factored as \(A = LL^T\), where \(L\) is a lower triangular matrix.

Invertibility: A positive definite matrix is always invertible, and its inverse is also positive definite.

Quadratic Form: The quadratic form \(\mathbf{x}^T A \mathbf{x}\) is always positive for any non-zero vector \(\mathbf{x}\).

Example - Positive Definite Matrix: Consider the matrix

(5.136)#\[\begin{equation} A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}. \end{equation}\]

For any non-zero vector \(\mathbf{x} = [x_1, x_2]^T\), the quadratic form is

(5.137)#\[\begin{equation} \mathbf{x}^T A \mathbf{x} = 2x_1^2 + 3x_2^2 > 0. \end{equation}\]

Since both coefficients are positive, the quadratic form is always positive for non-zero \(\mathbf{x}\). The eigenvalues are 2 and 3, both positive, so \(A\) is positive definite.

Example - Not Positive Definite: Consider the matrix

(5.138)#\[\begin{equation} B = \begin{bmatrix} 1 & 0 \\ 0 & -2 \end{bmatrix}. \end{equation}\]

For vector \(\mathbf{x} = [0, 1]^T\), the quadratic form is

(5.139)#\[\begin{equation} \mathbf{x}^T B \mathbf{x} = 1(0)^2 - 2(1)^2 = -2 < 0. \end{equation}\]

Since the quadratic form can be negative, \(B\) is not positive definite. The eigenvalues are 1 and -2, and the negative eigenvalue confirms \(B\) is not positive definite.

Method to Check Positive Definite: To determine if a matrix is positive definite, check that the matrix is symmetric (\(A = A^T\)) and all the eigenvalues of \(A\) are positive.

Here is a Python code to check if a matrix is positive definite.

A = np.array([[2, -1, 0], [-1, 2, -1], [0, -1, 2]])

# check if symmetric
print(A.T)
print(A.T ==A) 

[[ 2 -1  0]
 [-1  2 -1]
 [ 0 -1  2]]
[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]

Eigenvalues and Eigenvectors in NumPy To get the eigenvalues and eigenvectors use np.linalg.eig(array). This will return a named tuple. The first element are the eigenvalues and are second are the eigenvectors.

Here is np.linalg.eig applied to our example.

np.linalg.eig(A)

EigResult(eigenvalues=array([3.41421356, 2.        , 0.58578644]), eigenvectors=array([[-5.00000000e-01, -7.07106781e-01,  5.00000000e-01],
       [ 7.07106781e-01,  4.05405432e-16,  7.07106781e-01],
       [-5.00000000e-01,  7.07106781e-01,  5.00000000e-01]]))

# check if all eigenvalues are postive

eig_val, eig_vec= np.linalg.eig(A)

eig_val.all() > 0

np.True_

5.8.2. Cholesky Decomposition#

If \(A\) is a symmetric positive-definite matrix, we can factor \(A\) by Cholesky decomposition.

Cholesky decomposition allows us to factor \(A\) into

(5.140)#\[\begin{equation} A = LL^T, \end{equation}\]

where \(L\) is a lower triangular matrix and \(L^T\) is its transpose, an upper triangular matrix. Here is the procedure:

Step 1: Write the equation in terms of the Cholesky factorization

(5.141)#\[\begin{equation} \begin{split} A\mathbf{x} &= \mathbf{b} \\ L L^T\mathbf{x} &= \mathbf{b}. \end{split} \end{equation}\]

Step 2: Solve the intermediate system

(5.142)#\[\begin{equation} L\mathbf{y} = \mathbf{b}, \end{equation}\]

where \(\mathbf{y} = L^T \mathbf{x}\), by forward substitution, because \(L\) is lower triangular.

Step 3: With \(\mathbf{y}\) in hand, solve for \(\mathbf{x}\)

(5.143)#\[\begin{equation} L^T \mathbf{x} = \mathbf{y} \end{equation}\]

by backward substitution.

Summary: If the matrix \(A\) is symmetric and positive definite, use Cholesky to factor \(A = LL^T\). Then solve \(L\mathbf{y} = \mathbf{b}\) by forward substitution, and solve \(L^T \mathbf{x} = \mathbf{y}\) by backward substitution.

Forward and backward substitution are methods for solving triangular systems of linear equations. These techniques are used within matrix factorization methods described in these notes.

Forward Substitution: Forward substitution solves a lower triangular system \(L\mathbf{x} = \mathbf{b}\) where \(L\) is lower triangular. We solve for the unknowns starting from the first equation and working downward.

For a lower triangular matrix, the system has the form

(5.144)#\[\begin{equation} \begin{bmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix}. \end{equation}\]

This gives us the system of equations

(5.145)#\[\begin{equation} \begin{split} l_{11} x_1 &= b_1 \\ l_{21} x_1 + l_{22} x_2 &= b_2 \\ l_{31} x_1 + l_{32} x_2 + l_{33} x_3 &= b_3. \end{split} \end{equation}\]

We solve these equations sequentially. From the first equation we get

(5.146)#\[\begin{equation} x_1 = \frac{b_1}{l_{11}}. \end{equation}\]

Substituting into the second equation gives

(5.147)#\[\begin{equation} x_2 = \frac{b_2 - l_{21} x_1}{l_{22}}. \end{equation}\]

Finally, substituting both \(x_1\) and \(x_2\) into the third equation gives

(5.148)#\[\begin{equation} x_3 = \frac{b_3 - l_{31} x_1 - l_{32} x_2}{l_{33}}. \end{equation}\]

The general formula for forward substitution is

(5.149)#\[\begin{equation} x_i = \frac{b_i - \sum_{j=1}^{i-1} l_{ij} x_j}{l_{ii}}. \end{equation}\]

Backward Substitution: Backward substitution solves an upper triangular system \(U\mathbf{x} = \mathbf{b}\) where \(U\) is upper triangular. We solve for the unknowns starting from the last equation and working upward.

For an upper triangular matrix, the system has the form

(5.150)#\[\begin{equation} \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix}. \end{equation}\]

This gives us the system of equations

(5.151)#\[\begin{equation} \begin{split} u_{11} x_1 + u_{12} x_2 + u_{13} x_3 &= b_1 \\ u_{22} x_2 + u_{23} x_3 &= b_2 \\ u_{33} x_3 &= b_3. \end{split} \end{equation}\]

We solve these equations in reverse order. From the third equation we get

(5.152)#\[\begin{equation} x_3 = \frac{b_3}{u_{33}}. \end{equation}\]

Substituting into the second equation gives

(5.153)#\[\begin{equation} x_2 = \frac{b_2 - u_{23} x_3}{u_{22}}. \end{equation}\]

Finally, substituting both \(x_2\) and \(x_3\) into the first equation gives

(5.154)#\[\begin{equation} x_1 = \frac{b_1 - u_{12} x_2 - u_{13} x_3}{u_{11}}. \end{equation}\]

The general formula for backward substitution is

(5.155)#\[\begin{equation} x_i = \frac{b_i - \sum_{j=i+1}^{n} u_{ij} x_j}{u_{ii}}. \end{equation}\]

Example: Consider solving the lower triangular system

(5.156)#\[\begin{equation} \begin{bmatrix} 2 & 0 & 0 \\ 3 & 4 & 0 \\ 1 & 5 & 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 26 \\ 48 \end{bmatrix}. \end{equation}\]

Using forward substitution we solve

(5.157)#\[\begin{equation} \begin{split} x_1 &= \frac{8}{2} = 4 \\ x_2 &= \frac{26 - 3(4)}{4} = \frac{14}{4} = 3.5 \\ x_3 &= \frac{48 - 1(4) - 5(3.5)}{6} = \frac{26.5}{6} \approx 4.42. \end{split} \end{equation}\]

These substitution methods are efficient because they require only \(O(n^2)\) operations for an \(n \times n\) system, compared to \(O(n^3)\) for general Gaussian elimination. If we can manipulate the equations so the matrix is upper or lower triangular, the solution can be easily found by backward or forward substitution.

Consider the linear system \(A\mathbf{x} = \mathbf{b}\), where,

(5.158)#\[\begin{equation} A = \begin{bmatrix} 6 & 3 & 4 & 8 \\ 3 & 6 & 5 & 1 \\ 4 & 5 & 10 & 7 \\ 8 & 1 & 7 & 25 \end{bmatrix}, \quad \text{and } \mathbf{b} = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix}. \end{equation} \]

Below, Cholesky decomposition is used to solve for \(\mathbf{x}\).

# A matrix
A = np.array([
    [6, 3, 4, 8],
    [3, 6, 5, 1],
    [4, 5, 10, 7],
    [8, 1, 7, 25]
], dtype=float)

# Define the vector b
b = np.array([1, 2, 3, 4], dtype=float)

# check rank, rank = number of independent rows and columns
np.linalg.matrix_rank(A)

np.int64(4)

# check symmetric
A.T

array([[ 6.,  3.,  4.,  8.],
       [ 3.,  6.,  5.,  1.],
       [ 4.,  5., 10.,  7.],
       [ 8.,  1.,  7., 25.]])

# check eigenvalues all positive
np.linalg.eig(A)

EigResult(eigenvalues=array([31.48923716, 11.01742013,  1.25941947,  3.23392324]), eigenvectors=array([[ 0.34254383,  0.15773425,  0.61341024,  0.6939103 ],
       [ 0.14709227,  0.62461793, -0.66834122,  0.37621313],
       [ 0.37453336,  0.62134938,  0.32021262, -0.60919094],
       [ 0.84897135, -0.44597904, -0.27296843, -0.0764106 ]]))

# Cholesky method
L=np.linalg.cholesky(A)
L

array([[ 2.44948974,  0.        ,  0.        ,  0.        ],
       [ 1.22474487,  2.12132034,  0.        ,  0.        ],
       [ 1.63299316,  1.41421356,  2.30940108,  0.        ],
       [ 3.26598632, -1.41421356,  1.58771324,  3.13249102]])

# upper triangular
L.T

array([[ 2.44948974,  1.22474487,  1.63299316,  3.26598632],
       [ 0.        ,  2.12132034,  1.41421356, -1.41421356],
       [ 0.        ,  0.        ,  2.30940108,  1.58771324],
       [ 0.        ,  0.        ,  0.        ,  3.13249102]])

def forward_sub(L, b):
    """
    Solves Ly = b for y using forward substitution,
    where L is a lower triangular matrix.
    """
    n = L.shape[0]
    y = np.zeros_like(b, dtype=float)

    for i in range(n):
        if L[i, i] == 0:
            raise ValueError("Matrix is singular.")
        y[i] = (b[i] - np.dot(L[i, :i], y[:i])) / L[i, i]

    return y

def back_sub(U, y):
    """
    Solves Ux = y for x using back substitution,
    where U is an upper triangular matrix.
    """
    n = U.shape[0]
    x = np.zeros_like(y, dtype=float)

    for i in reversed(range(n)):
        if U[i, i] == 0:
            raise ValueError("Matrix is singular.")
        x[i] = (y[i] - np.dot(U[i, i+1:], x[i+1:])) / U[i, i]

    return x

y = forward_sub(L, b)
y

array([0.40824829, 0.70710678, 0.57735027, 0.87789557])

x=back_sub(L.T,y)
x

array([-0.48619958,  0.48195329,  0.05732484,  0.28025478])

# check results by verify that b is returned. 
np.matmul(A,x)

array([1., 2., 3., 4.])

5.8.3. QR Decomposition#

QR factorization decomposes a matrix \(A\) into the product of two matrices

(5.159)#\[\begin{equation} A = QR, \end{equation}\]

where \(Q\) has orthonormal columns (i.e., \(Q^T Q = I\)) and \(R\) is upper triangular.

Note: If \(A \in \mathbb{R}^{m \times n}\) with \(m > n\), then \(Q \in \mathbb{R}^{m \times n}\) is not square, but its columns are orthonormal, and \(R \in \mathbb{R}^{n \times n}\) is square and upper triangular. This is called the economy-size QR decomposition and is used when \(A\) is tall and full-rank.

In contrast, the full QR decomposition produces a square \(Q \in \mathbb{R}^{m \times m}\), with \(R \in \mathbb{R}^{m \times n}\), and may include zero rows in \(R\) when \(m > n\).

Solving \(A\mathbf{x} = \mathbf{b}\) with QR: Starting from

(5.160)#\[\begin{equation} A\mathbf{x} = \mathbf{b}, \end{equation}\]

we substitute \(A = QR\) to get

(5.161)#\[\begin{equation} QR \mathbf{x} = \mathbf{b}. \end{equation}\]

Multiplying both sides by \(Q^T\) gives

(5.162)#\[\begin{equation} Q^T Q R \mathbf{x} = Q^T \mathbf{b} \quad \Rightarrow \quad R \mathbf{x} = Q^T \mathbf{b}, \end{equation}\]

since \(Q^T Q = I\). We then solve the upper triangular system

(5.163)#\[\begin{equation} R\mathbf{x} = Q^T \mathbf{b} \end{equation}\]

using backward substitution.

Python Example:

import numpy as np
from scipy.linalg import qr, solve_triangular

# Example matrix A and vector b
A = np.array([
    [1, 1],
    [1, -1],
    [1, 2]
], dtype=float)
b = np.array([4, 2, 5], dtype=float)

# Economy QR decomposition
Q, R = qr(A, mode='economic')

# Solve R x = Q^T b
Qt_b = Q.T @ b
x = solve_triangular(R, Qt_b)

print("Solution x:", x)

In the example above, you will get an error if you do not specify, mode=’economic’.

Here is another Python example.

# QR decompostion
Q, R = np.linalg.qr(A)

array([[-0.53665631,  0.01164445,  0.53083247, -0.65580584],
       [-0.26832816, -0.72195591,  0.37863574,  0.51323935],
       [-0.35777088, -0.47742246, -0.73871091, -0.31364627],
       [-0.71554175,  0.50071136, -0.1707573 ,  0.45621276]])

# check orthogonal
Q.T@Q

array([[ 1.00000000e+00,  3.32936238e-17,  2.22818207e-17,
        -3.77283718e-17],
       [ 3.32936238e-17,  1.00000000e+00, -2.69673574e-17,
        -5.57127010e-17],
       [ 2.22818207e-17, -2.69673574e-17,  1.00000000e+00,
         1.71045481e-16],
       [-3.77283718e-17, -5.57127010e-17,  1.71045481e-16,
         1.00000000e+00]])

array([[-11.18033989,  -5.72433402, -12.07476708, -24.95451863],
       [  0.        ,  -6.18320305,  -4.83244683,   8.54702644],
       [  0.        ,   0.        ,  -4.56590162,  -4.81461334],
       [  0.        ,   0.        ,   0.        ,   4.4765877 ]])

# solve by QR
y = Q.T@b

x=back_sub(R,y)

array([-0.48619958,  0.48195329,  0.05732484,  0.28025478])

# check results by verifying that b is returned 
A@x

array([1., 2., 3., 4.])

5.8.4. Singular Value Decomposition (SVD)#

Any real matrix \(A\) has a singular value decomposition (SVD)

(5.164)#\[\begin{equation} A = U \Sigma V^T, \end{equation}\]

where \(U\) is an \(m \times m\) orthogonal matrix (\(U^TU=I\)), \(\Sigma\) is an \(m \times n\) diagonal matrix with non-negative singular values \(\sigma_i \geq 0\), and \(V^T\) is the transpose of an \(n \times n\) orthogonal matrix (\(V^TV = I\)). The matrix \(\Sigma\) has the form

(5.165)#\[\begin{equation} \Sigma = \begin{bmatrix} \sigma_1 & 0 & \cdots & 0 \\ 0 & \sigma_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_n \end{bmatrix}. \end{equation}\]

SVD Properties: SVD works for any matrix (square, rectangular, full-rank, or rank-deficient), is numerically stable and robust to ill-conditioning, and is useful in least-squares problems or when \(A\) is not invertible.

To use the SVD method, write

(5.166)#\[\begin{equation} A = U \Sigma V^T \end{equation}\]

in terms of the SVD decomposition

(5.167)#\[\begin{equation} U \Sigma V^T \mathbf{x} = \mathbf{b}. \end{equation}\]

Multiply both sides by \(U^T\) to get

(5.168)#\[\begin{equation} \Sigma V^T \mathbf{x} = U^T \mathbf{b}. \end{equation}\]

Let \(\mathbf{y} = V^T \mathbf{x}\) so that

(5.169)#\[\begin{equation} \Sigma \mathbf{y} = U^T \mathbf{b}. \end{equation}\]

Solve for \(\mathbf{y}\) element-wise

(5.170)#\[\begin{equation} y_i = \frac{(U^T \mathbf{b})_i}{\sigma_i} \quad \text{if } \sigma_i \neq 0, \end{equation}\]

then compute

(5.171)#\[\begin{equation} \mathbf{x} = V \mathbf{y}. \end{equation}\]

5.8.5. Pseudo-Inverse#

When any of the singular values \(\sigma_i\) are zero, then \(\Sigma^{-1}\) does not exist, since there would be diagonal elements composed of division by zero, \(1/\sigma_i = 1/0\).

The matrix that recovers all recoverable information is called the pseudo-inverse, and is often denoted \(A^+\). We can obtain the pseudo-inverse from the SVD by inverting all singular values that are non-zero, and leaving all zero singular values at zero. Here is an example where the second singular value is zero

(5.172)#\[\begin{equation} \Sigma = \begin{bmatrix} \sigma_1 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma_n \end{bmatrix}, \quad \Sigma^{+} = \begin{bmatrix} \frac{1}{\sigma_1} & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{1}{\sigma_n} \end{bmatrix}. \end{equation}\]

Here is a Python example of SVD in action.

# SVD example
U,S, V = np.linalg.svd(A)

array([[-0.34254383, -0.15773425,  0.6939103 , -0.61341024],
       [-0.14709227, -0.62461793,  0.37621313,  0.66834122],
       [-0.37453336, -0.62134938, -0.60919094, -0.32021262],
       [-0.84897135,  0.44597904, -0.0764106 ,  0.27296843]])

U.T@U

array([[ 1.00000000e+00, -3.44691053e-17, -1.33679825e-16,
         4.61942119e-17],
       [-3.44691053e-17,  1.00000000e+00,  1.16137298e-16,
        -1.32691025e-17],
       [-1.33679825e-16,  1.16137298e-16,  1.00000000e+00,
         9.56240473e-17],
       [ 4.61942119e-17, -1.32691025e-17,  9.56240473e-17,
         1.00000000e+00]])

V.T@V

array([[1.00000000e+00, 1.10990843e-16, 1.59002611e-16, 1.77819624e-16],
       [1.10990843e-16, 1.00000000e+00, 4.61053319e-16, 1.67794199e-16],
       [1.59002611e-16, 4.61053319e-16, 1.00000000e+00, 3.51079312e-17],
       [1.77819624e-16, 1.67794199e-16, 3.51079312e-17, 1.00000000e+00]])

array([31.48923716, 11.01742013,  3.23392324,  1.25941947])

S_inv=np.diag(1/S)

x = V.T@S_inv@U.T@b
x

array([-0.48619958,  0.48195329,  0.05732484,  0.28025478])

A@x

array([1., 2., 3., 4.])

5.8.5.1. Matrix Factorization Example Problems#

Problem 1

Solve the system of linear equations \(A \mathbf{x} = \mathbf{b}\) using Cholesky factorization, where

(5.173)#\[\begin{equation} A = \begin{bmatrix} 4 & 1 & 1 & 0 \\ 1 & 3 & 0 & 1 \\ 1 & 0 & 2 & 0 \\ 0 & 1 & 0 & 2 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 6 \\ 6 \\ 5 \\ 5 \end{bmatrix} \end{equation}\]

1.1) Show that the matrix is symmetric.

1.2) Check that all the eigenvalues are positive.

1.3) Conclude that the matrix is positive definite.

1.4) Implement the procedure to solve by Choleksy factorization.

Problem 2

Solve the following system of linear equations using QR factorization:

Solve the system of equations:

(5.174)#\[\begin{equation} \begin{aligned} x + y &= 2 \\ 2x + y &= 3 \\ 3x + y &= 4 \end{aligned} \end{equation}\]

We write this in matrix form:

(5.175)#\[\begin{equation} A = \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 2 \\ 3 \\ 4 \end{bmatrix} \end{equation}\]

Problem 3

Consider the following linear system:

(5.176)#\[\begin{equation} A \mathbf{x} = \mathbf{b} \end{equation}\]

Where:

(5.177)#\[\begin{equation} A = \begin{bmatrix} 4 & 2 & 1 & 3 \\ 2 & 3 & 4 & 1 \\ 3 & 4 & 2 & 5 \\ 1 & 3 & 5 & 2 \end{bmatrix} \quad \mathbf{b} = \begin{bmatrix} 12 \\ 16 \\ 18 \\ 14 \end{bmatrix} \end{equation}\]

Solve for \( \mathbf{x} \) using Singular Value Decomposition (SVD).