5.6. Matrix Condition Number#
draft version do not distribute
%%html
<div align="center">
<iframe width="600" height="400" src="https://www.youtube.com/embed/Axdj-m4bjyw?si=B_VQllegJXDVVkA3" frameborder="0" allowfullscreen></iframe>
</div>
References:
A condition number measures how sensitive a function’s output is to changes or errors in its input. Suppose you have a function \(y=f(x)\). If you perturb \(x\) by a small amount \(\delta x\), and the relative difference \([f(x + \delta x) - f(x)]/f(x)\) is small, then the condition number is small. If \([f(x + \delta x) - f(x)]/f(x)\) is large, then the condition number is large. The condition number measures how error is magnified in computational problems.
Example: Consider the simple function \(f(x) = x - 1\). Let \(x = 10\) and perturb it by \(\delta x = 0.1\), so \(x + \delta x = 10.1\). Then
The relative change in the output is
Now consider \(x = 1.1\) with the same perturbation \(\delta x = 0.1\), so \(x + \delta x = 1.2\). Then
The relative change in the output is
The same absolute perturbation \(\delta x = 0.1\) causes a relative change of about 1% when \(x = 10\), but a 100% relative change when \(x = 1.1\). The function is poorly conditioned near \(x = 1\) where \(f(x)\) is close to zero.
In these notes, we focus on linear systems of the form
The question we address is how the solution to \(A\mathbf{x} = \mathbf{b}\) changes if we introduce small perturbations to the elements of \(A\) or \(\mathbf{b}\). The matrix condition number quantifies the impact these small changes have on the solution. T
We will utilize the properties of matrix norms. The key properties of matrix norms are
\(\| A \| > 0\)
\(\| A \| = 0\) if and only if all elements are zero, i.e., \(A_{ij} = 0\) for all \(i\), \(j\)
\(\| k A \| = |k| \| A \|\), where \(k\) is a scalar
\(\| A + B \| \leq \| A \| + \| B \|\), the triangle inequality
\(\| A B \| \leq \| A \| \| B \|\), the submultiplicative property
Triangle Inequality: To understand the triangle inequality, consider physical vectors in space. Suppose you walk from point A to B, call this vector \(\mathbf{v}_1\), then walk from point B to C, call this vector \(\mathbf{v}_2\). The total distance traveled is \(\|\mathbf{v}_1\| + \|\mathbf{v}_2\|\). However, the direct distance from A to C is \(\|\mathbf{v}_1 + \mathbf{v}_2\|\). Visualize \(\mathbf{v}_1\) as one side of a triangle, \(\mathbf{v}_2\) as another side, and \(\|\mathbf{v}_1 + \mathbf{v}_2\|\) as the third side. The direct path is always shorter than or equal to the sum of the two segments
This property holds for matrix norms as well.
Submultiplicative Property: The submultiplicative property states that the norm of a product is bounded by the product of the norms. For matrices, this means
The submultiplicative property can be understood as the “net-stretching” of composite linear transformations being no more than the product of the individual stretching of each transformation. Matrix \(A\) can stretch vectors by at most a factor of \(\|A\|\), and \(B\) can stretch vectors by at most a factor of \(\|B\|\). Applying both transformations in sequence can stretch vectors by at most the product \(\|A\| \|B\|\). The actual stretching may be less if the transformations partially cancel, which is why we have an inequality rather than equality.
A common matrix norm is the Frobenius norm (also known as the Euclidean norm or \(\ell^2\) norm), which is the square root of the sum of all elements squared
You can check this in Python with the following code.
A = np.random.rand(3,3)
print(np.linalg.norm(A))
norm = 0
for i in range(3):
for j in range(3):
norm += A[i,j]**2
print(np.sqrt(norm))
5.6.1. Condition Number#
We will now derive the condition number. Keep in mind that there are multiple ways to derive the condition number, and we present two complementary approaches below.
Derivation 1 : Perturbation on the Right Hand Side
We will motivate the definition of the condition number of the matrix \(A\) by finding an inequality that bounds the relative error in \(\mathbf{x}\) to the relative error in \(\mathbf{b}\):
Consider the system \(A(\mathbf{x} + \Delta \mathbf{x}) = \mathbf{b} + \Delta \mathbf{b}\) where \(A\) is invertible.
Then \(\mathbf{x} + \Delta \mathbf{x} = A^{-1}(\mathbf{b} + \Delta \mathbf{b}) = A^{-1}\mathbf{b} + A^{-1}\Delta \mathbf{b} = \mathbf{x} + A^{-1}\Delta \mathbf{b}\), and the change in the solution to the system is given by \(\Delta \mathbf{x} = A^{-1}\Delta \mathbf{b}\).
Using the submultiplicative property of matrix norms, we can find a bound on the absolute error:
If we multiply both sides by \(\| \mathbf{b} \|\) and use \(\| \mathbf{b} \| = \|A\mathbf{x}\| \leq \| A \| \|\mathbf{x}\|\), we have:
Dividing both sides by \(\|\mathbf{x}\| \|\mathbf{b}\|\), we obtain a bound on the relative error:
This inequality shows that the relative change in the solution \(\mathbf{x}\) is at most \(\kappa(A) = \|A\| \|A^{-1}\|\) times the relative change in \(\mathbf{b}\).
Derivation 2: Perturbation in the Matrix
We now consider what happens when we perturb the matrix \(A\) itself by adding a small change, \(\Delta A\), so that \(A \to A + \Delta A\). As a result, we expect the output \(\mathbf{x}\) to change under this perturbation. We assume the perturbations \(\Delta A\) and \(\Delta \mathbf{x}\) are small.
Adding perturbations to the equation \(A \mathbf{x} = \mathbf{b}\), we have
The first term on the left hand side and the right hand side are equal, since \(A \mathbf{x} = \mathbf{b}\). Since we assume perturbations are small, the term \(\Delta A \Delta \mathbf{x}\) is second-order in small quantities and can be neglected. This gives us
We now take the norm of both sides
where we used the submultiplicative property of matrix norms: \(\|ABC\| \leq \|A\| \cdot \|B\| \cdot \|C\|\).
Continuing we have
where \(\kappa(A) = \|A^{-1}\| \cdot \|A\|\) is the condition number.
Definition: The condition number of the square matrix \(A\) is defined as
If the condition number is large, then even small perturbations in \(A\) or \(\mathbf{b}\) can lead to large relative errors in the solution \(\mathbf{x}\). The condition number provides a unified measure of sensitivity:
for perturbations in \(\mathbf{b}\), and
for perturbations in \(A\).
5.6.1.1. What is a Large Condition Number#
condition numbers on the order of 1 are excellent
condition numbers on the order of \(10\) to \(10^2\) are considered “well-conditioned”
condition numbers larger than \(10^2\) are “ill-conditioned”
How to get the Condition Number Numpy provides a function for the condition number, np.linalg.cond().
Here are Some Examples
import numpy as np
from scipy.linalg import hilbert
Example From Video
Example Find the solution \(\mathbf{x}\) to \(A \mathbf{x}= \mathbf{b}\) for the above example. Use NumPy to get the condition number of \(A\). Perturb \(\mathbf{b}\) and observe how the output changes.
A = np.array([[0.2161, 0.1441],[1.2969, 0.8648]])
b = np.array([0.1440, 0.8642])
A
array([[0.2161, 0.1441],
[1.2969, 0.8648]])
b
array([0.144 , 0.8642])
# A has a very large condition number
kappa=np.linalg.cond(A)
kappa
np.float64(249729263.7996639)
# solve for x
x = np.linalg.solve(A, b)
x
array([ 2., -2.])
# check solution
A@x
array([0.144 , 0.8642])
# add perturbation to b
db = np.array([0, .00001])
bp = b + db
bp
array([0.144 , 0.86421])
# see how perturbation changes output
xp = np.linalg.solve(A, bp)
xp
array([ 146.10000023, -218.10000034])
5.6.2. Hilbert Matrix#
A Hilbert matrix \( H \) has elements defined by
A \(3 \times 3\) Hilbert matrix is
Hilbert matrices are known to have large condition numbers. Below is an example of a \(5 \times 5\) Hilbert matrix. You can use np.linalg.cond() to compute the condition number of the matrix.
Example Create a \(5 \times 5\) Hilbert matrix and solve \(H\mathbf{x} = \mathbf{b}\) for \(\mathbf{b} = [1~ 1~ 1~ 1~ 1]^T\).
from scipy.linalg import hilbert
# Create a 5x5 Hilbert matrix
H = hilbert(5)
# Create a right-hand side vector (e.g., all ones)
b = np.ones(5)
# Solve Hx = b
x = np.linalg.solve(H, b)
# Calculate condition number
cond_number = np.linalg.cond(H)
print("Hilbert matrix (5x5):\n", H)
print("\nThis is vector b:", b)
print("\nCondition number:", cond_number)
print("\nSolution x:", x)
print("\nCheck soltuion recovers b:", H@x)
Hilbert matrix (5x5):
[[1. 0.5 0.33333333 0.25 0.2 ]
[0.5 0.33333333 0.25 0.2 0.16666667]
[0.33333333 0.25 0.2 0.16666667 0.14285714]
[0.25 0.2 0.16666667 0.14285714 0.125 ]
[0.2 0.16666667 0.14285714 0.125 0.11111111]]
This is vector b: [1. 1. 1. 1. 1.]
Condition number: 476607.2502425855
Solution x: [ 5. -120. 630. -1120.00000001
630. ]
Check soltuion recovers b: [1. 1. 1. 1. 1.]
Let’s change the vector \(\mathbf{b}\) slightly by changing the first element to 0.1. Notice how much the solution \(\mathbf{x}\) changes.
# Solve Hx = b
bp=np.copy(b)
bp[0] = .1
xp = np.linalg.solve(H, bp)
# Calculate condition number
cond_number = np.linalg.cond(H)
print("Hilbert matrix (5x5):\n", H)
print("\nThis is vector b:", bp)
print("\nCondition number:", cond_number)
print("\nThe perturbed solution:", xp)
print("\nThe unperturbed solution:", x)
print("\nCheck soltuion recovers perturbed b:", H@xp)
Hilbert matrix (5x5):
[[1. 0.5 0.33333333 0.25 0.2 ]
[0.5 0.33333333 0.25 0.2 0.16666667]
[0.33333333 0.25 0.2 0.16666667 0.14285714]
[0.25 0.2 0.16666667 0.14285714 0.125 ]
[0.2 0.16666667 0.14285714 0.125 0.11111111]]
This is vector b: [0.1 1. 1. 1. 1. ]
Condition number: 476607.2502425855
The perturbed solution: [ -17.5 150. -315. 140. 63. ]
The unperturbed solution: [ 5. -120. 630. -1120.00000001
630. ]
Check soltuion recovers perturbed b: [0.1 1. 1. 1. 1. ]
Example from MathWorks, Cleve Moler, What is a Matrix Condition Number?
Consider
Take \(\mathbf{b}\) to be the first column of \(A\), \(\mathbf{b} = [ 4.1, 9.7]^T\). Solve for \(\mathbf{x}\). Add \(.01\) to the first element of \(\mathbf{b}\) and notice how the solution changes.
Perturb \(A\) slightly by,
Notice how the solution for \(\mathbf{x}\) changes significantly.
A= np.array([[4.1, 2.8],[9.7 ,6.6]])
print("Condition number for A", np.linalg.cond(A))
b=A[:,0]
print("b-vector", b)
x= np.linalg.solve(A,b)
print("Solution x:", x)
Condition number for A 1622.9993838565106
b-vector [4.1 9.7]
Solution x: [1. 0.]
# perturb b
bp = np.array([0.01, 0]) + b
print("perturbed b-vector:\n", bp)
xp = np.linalg.solve(A, bp)
print("pertrubed solution for x:\n", xp )
perturbed b-vector:
[4.11 9.7 ]
pertrubed solution for x:
[0.34 0.97]
error_b = np.linalg.norm(bp - b)/np.linalg.norm(b)
error_x = np.linalg.norm(xp-x)/np.linalg.norm(x)
kappa = np.linalg.cond(A)
print("error in b-vector:\n", error_b)
print("error in x-vector:\n", error_x)
print("condition number times error in b-vector:\n", kappa*error_b)
error in b-vector:
0.000949585833500486
error in x-vector:
1.1732433677630363
condition number times error in b-vector:
1.5411772226901599
dA = np.array([[0, 0],[-.024, .008]])
print("dA:\n", dA)
print("A + dA:\n", A + dA)
dA:
[[ 0. 0. ]
[-0.024 0.008]]
A + dA:
[[4.1 2.8 ]
[9.676 6.608]]
# solve for x
print("perturbed solution:\n", np.linalg.solve(A + dA, b) )
perturbed solution:
[ 1.56387916e+13 -2.28996591e+13]
np.linalg.inv(A + dA)
array([[-1.53781451e+15, 6.51616316e+14],
[ 2.25179981e+15, -9.54152463e+14]])
np.linalg.cond(A + dA)
np.float64(8675056625322446.0)
Example from MIT Dynamics Systems and Control Course This example is from an MIT course on Dynamic Systems and Control, chapter 4.
Consider
What is the solution to \(A \mathbf{x}= \mathbf{b}\) for \(\mathbf{b} = [ 1, -1 ]^T\). What happens to the solution when you perturb \(A\) by a small amount on the order of 0.1%, by
Here is the Python solution.
# Matrix from example above
A = np.array([[100, 100],[100.2, 100]])
A
array([[100. , 100. ],
[100.2, 100. ]])
# b-vector
b = np.array([1, -1])
# compute the inverse
A_inv = np.linalg.inv(A)
A_inv
array([[-5. , 5. ],
[ 5.01, -5. ]])
# unperturbed solution
x = A_inv@b
print("The unperturbed solution is:", x)
The unperturbed solution is: [-10. 10.01]
# perturbation
dA = np.array([[0, 0],[-.1, 0]])
A + dA
array([[100. , 100. ],
[100.1, 100. ]])
# pertrubed inverse
Ap_inv=np.linalg.inv(A + dA)
Ap_inv
array([[-10. , 10. ],
[ 10.01, -10. ]])
# perturbed solution
xp = Ap_inv@b
print("The perturbed solution is:", xp)
The perturbed solution is: [-20. 20.01]
dx=x - xp
a = np.linalg.norm(dx)/np.linalg.norm(x)
a
np.float64(0.9995001250624836)
b = np.linalg.norm(dA)/np.linalg.norm(A)
b
np.float64(0.0004997499377186054)
kappa = np.linalg.cond(A)
kappa
np.float64(2002.0015004997722)
kappa*b
np.float64(1.0005001251873158)
5.6.3. Matrix 2-Norm, SVD, and Condition Number#
The 2-norm (also called the spectral norm or operator norm) of a matrix measures how much a matrix can stretch a vector in Euclidean space. It is defined as the maximum singular value of the matrix. A high 2-norm means the matrix can stretch vectors significantly, whereas a low 2-norm means the matrix causes little distortion.
For a matrix \(A\), the 2-norm is given by
where \(\mathbf{x}\) is any nonzero vector, \(\|A\mathbf{x}\|_2\) is the 2-norm of \(A\mathbf{x}\), and \(\|\mathbf{x}\|_2\) is the 2-norm of \(\mathbf{x}\).
Alternatively, the 2-norm can be computed as
where \(\sigma_{\max}(A)\) is the largest singular value of \(A\), which can be computed via singular value decomposition (SVD).
Properties of the 2-Norm:
Non-negativity: \(\|A\|_2 \geq 0\), and \(\|A\|_2 = 0\) if and only if \(A\) is the zero matrix.
Scaling: For any scalar \(k\), \(\|kA\|_2 = |k| \|A\|_2\).
Triangle inequality: \(\|A + B\|_2 \leq \|A\|_2 + \|B\|_2\).
Submultiplicative property: \(\|AB\|_2 \leq \|A\|_2 \|B\|_2\).
Singular Value Decomposition: The SVD of a matrix \(A\) is
where \(U \in \mathbb{R}^{m \times m}\) is orthogonal (\(U^T U = I\)), \(\Sigma \in \mathbb{R}^{m \times n}\) is diagonal with non-negative entries, and \(V \in \mathbb{R}^{n \times n}\) is orthogonal (\(V^T V = I\)).
The diagonal entries \(\sigma_1, \sigma_2, \dots, \sigma_r\) of \(\Sigma\) are called the singular values of \(A\). Singular values describe how much \(A\) stretches vectors along different directions. We have \(\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r \geq 0\), where \(\sigma_1\) is the maximum stretching and \(\sigma_r\) is the minimum stretching.
Matrix Condition Number from SVD: The condition number of \(A\) (with respect to the 2-norm) is defined as
where \(\sigma_{\max}(A)\) is the largest singular value and \(\sigma_{\min}(A)\) is the smallest nonzero singular value.
Example: Below the condition number and matrix norm for a \(6 \times 6\) Hilbert matrix is computed via SVD and shown to match np.linalg.cond() and np.linalg.norm().
# 6 by 6 Hilbert matrix
H = hilbert(6)
print("6 by 6 Hilbert matrix:\n",H)
6 by 6 Hilbert matrix:
[[1. 0.5 0.33333333 0.25 0.2 0.16666667]
[0.5 0.33333333 0.25 0.2 0.16666667 0.14285714]
[0.33333333 0.25 0.2 0.16666667 0.14285714 0.125 ]
[0.25 0.2 0.16666667 0.14285714 0.125 0.11111111]
[0.2 0.16666667 0.14285714 0.125 0.11111111 0.1 ]
[0.16666667 0.14285714 0.125 0.11111111 0.1 0.09090909]]
print("Condition number from linalg.cond:",np.linalg.cond(H))
Condition number from linalg.cond: 14951058.642254734
print("2-norm from linalg.norm:",np.linalg.norm(H,2))
2-norm from linalg.norm: 1.6188998589243393
# SVD factorization
U,S,VT=np.linalg.svd(H)
print("Max singular value, compare to 2-norm")
np.max(S)
Max singular value, compare to 2-norm
np.float64(1.6188998589243393)
# remove any zero elements in S
idx=np.nonzero(S)
S=S[idx]
print("Ratio of maximum and minimum (smallest non-zero) singular values, compare to condition number.")
np.max(S)/np.min(S)
Ratio of maximum and minimum (smallest non-zero) singular values, compare to condition number.
np.float64(14951058.642254738)
A = np.array([[11, 10, 14], [12, 11, -13], [14, 13, -66]])
b = np.array([1.001, .999, 1.001])
x = np.linalg.solve(A, b)
x
array([-0.683, 0.843, 0.006])
A@x
array([1.001, 0.999, 1.001])
bp = np.ones(3)
xp = np.linalg.solve(A,bp)
xp
array([ 1.00000000e+00, -1.00000000e+00, -1.66533454e-16])
np.linalg.norm(bp-b)/np.linalg.norm(b)*100
np.float64(0.09996663337406798)
np.linalg.norm(xp-x)/np.linalg.norm(x)*100
np.float64(230.0355196923597)