Matrix Decomposition Demystified

Eigen Decomposition, SVD, and Pseudo-inverse Matrix Made Easy

Fundamentals

Published

August 31, 2021

1 Matrix Decomposition

In this article, we discuss the following Matrix Decomposition topics:

Square Matrix
Eigenvalue and Eigenvector
Symmetric Matrix
Eigen Decomposition
Orthogonal Matrix
Singular Value Decomposition
PSeudo-inverse Matrix

2 Square Matrix

Eigen Decomposition works only with square matrices.

As a quick reminder, let’s look at a square matrix. In square matrices, the number of rows and columns are the same.

For example,

$A = [\begin{matrix} 4 & 1 \\ 3 & 2 \end{matrix}]$

That was easy. Let’s continue with Eigenvalue and Eigenvector concepts.

3 Eigenvalue and Eigenvector

When a square matrix A and a vector $x$ have the following relationship:

$A x = λ x$

, we call λ eigenvalue and the vector x eigenvector. λ is a scalar.

The above means that multiplying vector $x$ with matrix $A$ produces the same result as multiplying vector $x$ with scalar value $λ$ .

Let’s move everything to the left side of the equation to find the condition for having eigenvalues and eigenvectors.

$(A - λ I) x = 0$

For vector $x$ to be non-zero, no inverse matrix of $(A — λ I)$ should exist. In other words, $(A — I) $’s determinant must be zero.

$det (A - λ I) = 0$

Let’s use the 2x2 square matrix as an example to calculate the determinant:

$A = [\begin{matrix} 4 & 1 \\ 3 & 2 \end{matrix}]$

So, $(A — λ I)$ is:

$A - λ I = [\begin{matrix} 4 - λ & 1 \\ 3 & 2 - λ \end{matrix}]$

And we calculate the determinant (and we want it to be zero):

$\begin{aligned} det (A - λ I) & = | \begin{array}{c} 4 - λ & 1 \\ 3 & 2 - λ \end{array} | \\ = (4 - λ) (2 - λ) - 3 \\ = 8 - 6 λ + λ^{2} - 3 \\ = λ^{2} - 6 λ + 5 \\ = (λ - 1) (λ - 5) \\ = 0 \end{aligned}$

Therefore, $λ$ is either 1 or 5.

Let’s define the eigenvector as follows:

$x = [\begin{matrix} x_{1} \\ x_{2} \end{matrix}]$

As a reminder, we have the following equation for eigenvalue and eigenvector:

$(A - λ I) = 0$

We can calculate the eigenvector $x$ for $λ = 1$ as follows:

$\begin{aligned} A & = [\begin{array}{c} 4 - 1 & 1 \\ 3 & 2 - 1 \end{array}] [\begin{array}{c} x_{1} \\ x_{2} \end{array}] = [\begin{array}{c} 0 \\ 0 \end{array}] \\ ∴ 3 x_{1} + x_{2} = 0 \end{aligned}$

To satisfy the above conditions, $x_{1}$ and $x_{2}$ are:

$\begin{aligned} x_{1} & = - \frac{1}{3} t \\ x_{2} & = t \end{aligned}$

Although t can be any value, we often make the L2 norm to be a size of 1 since, otherwise, we have an infinite number of possible solutions:

$x_{1}^{2} + x_{2}^{2} = 1$

The eigenvector for the eigenvalue $λ = 1$ is:

$x = [\begin{matrix} - \frac{1}{\sqrt{10}} \\ \frac{3}{\sqrt{10}} \end{matrix}]$

$x = [\begin{matrix} \frac{1}{\sqrt{10}} \\ - \frac{3}{\sqrt{10}} \end{matrix}]$

They are the same, except that one vector direction is the opposite. So, I’ll choose the first one as the eigenvector for $λ = 1$ .

Let’s make sure this works as intended:

$\begin{aligned} A x & = [\begin{array}{c} 4 & 1 \\ 3 & 2 \end{array}] [\begin{array}{c} - \frac{1}{\sqrt{10}} \\ \frac{3}{\sqrt{10}} \end{array}] \\ = [\begin{array}{c} 4 \cdot - \frac{1}{\sqrt{10}} + 1 \cdot \frac{3}{\sqrt{10}} \\ 3 \cdot - \frac{1}{\sqrt{10}} + 2 \cdot \frac{3}{\sqrt{10}} \end{array}] \\ = [\begin{array}{c} - \frac{1}{\sqrt{10}} \\ \frac{3}{\sqrt{10}} \end{array}] \\ = 1 x \end{aligned}$

We can solve for $λ = 5$ using the same process:

$x = [\begin{matrix} \frac{1}{s q r t 2} \\ \frac{1}{s q r t 2} \end{matrix}]$

For 3×3 or larger matrices, it becomes too tedious to calculate all that manually. We can write a Python script with NumPy to do the same:

import numpy as np

A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]])

print('A=')
print(A)
print()

values, vectors = np.linalg.eig(A)

print('eigenvalues')
print(values)
print()

print('eigenvectors')
print(vectors)

Below is the output:

A=
[[1 2 3]
 [4 5 6]
 [7 8 9]]

eigenvalues
[ 1.61168440e+01 -1.11684397e+00 -1.30367773e-15]

eigenvectors
[[-0.23197069 -0.78583024 0.40824829]
 [-0.52532209 -0.08675134 -0.81649658]
 [-0.8186735 0.61232756 0.40824829]]

We can confirm they work as expected:

λ = values[0]
x = vectors[:, 0]

print('λ * x')
print(λ * x)
print()
print('A @ x')
print(A @ x)

Below is the output:

λ * x
[ -3.73863537 -8.46653421 -13.19443305]

A @ x
[ -3.73863537 -8.46653421 -13.19443305]

This works fine but if you want to keep the vectors in column format, you can do the following:

λ = values[0]
x = vectors[:, 0:1]

print('λ * x')
print(λ * x)
print()
print('A @ x')
print(A @ x)

Below is the output:

λ * x
[[ -3.73863537]
 [ -8.46653421]
 [-13.19443305]]

A @ x
[[ -3.73863537]
 [ -8.46653421]
 [-13.19443305]]

4 Symmetric Matrix

Eigenvectors of a symmetric matrix are orthogonal to each other. This fact will become helpful later on. So, let’s prove it here.

Suppose $λ_{1}$ and $λ_{2}$ ( $λ_{1} \neq λ_{2}$ ) are eigenvalues, and $x_{1}$ and $x_{2}$ are the corresponding eigenvectors:

$\begin{aligned} λ_{1} x_{1} \cdot x_{2} & = (λ_{1} x_{1})^{⊤} x_{2} \\ = (A x_{1})^{⊤} x_{2} \\ = x_{1}^{⊤} A^{⊤} x_{2} \\ = x_{1}^{⊤} A x_{2} A is symmetric \\ = x_{1}^{⊤} λ_{2} x_{2} \\ = λ_{2} x_{1}^{⊤} x_{2} \\ = λ_{2} x_{1} \cdot x_{2} \end{aligned}$

Therefore,

$λ_{1} x_{1} \cdot x_{2} = λ_{2} x_{1} \cdot x_{2}$

However, $λ_{1} \neq λ_{2}$ . As such, the following must be true:

$x_{1} \cdot x_{2} = 0$

Therefore, eigenvectors of a symmetric matrix are orthogonal to each other.

Now we are ready to discuss how Eigen Decomposition works.

5 Eigen Decomposition

Using eigenvalues and eigenvectors, we can decompose a square matrix A as follows:

$A = Q Λ Q^{- 1}$

Q is a matrix that has eigenvectors in the columns.

$Q = [x_{1} x_{2} \dots x_{n}]$

$Λ$ (uppercase lambda) is a diagonal matrix that has eigenvalues as the diagonal elements:

$Λ = [\begin{matrix} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{n} \end{matrix}]$

We put eigenvalues in descending order to make the diagonal matrix $Λ$ unique:

$λ_{1} > λ_{2} > \dots > λ_{n}$

To prove that such Eigen Decomposition is possible, we adjust the equation slightly:

$\begin{aligned} A & = Q Λ Q^{- 1} \\ A Q & = Q Λ \end{aligned}$

And we will show that $A Q = Q Λ$ is true.

$\begin{aligned} A Q & = A [x_{1} x_{2} \dots x_{n}] \\ = [A x_{1} A x_{2} \dots A x_{n}] \\ = [λ_{1} x_{1} λ_{2} x_{2} \dots λ_{n} x_{n}] \\ = [x_{1} x_{2} \dots x_{n}] [\begin{array}{c} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{n} \end{array}] \\ = Q Λ \end{aligned}$

In other words, AQ is equivalent to each eigenvector inside matrix Q multiplied by the corresponding eigenvalue.

So, we have proved that Eigen Decomposition is possible.

Let’s try Eigen Decomposition with Numpy:

import numpy as np

A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]], dtype=np.float64)

print('A=')
print(A)
print()

# Eigenvalue and Eigenvector
values, vectors = np.linalg.eig(A)

Λ = np.diag(values)
Q = vectors

QΛQinv = Q @ Λ @ np.linalg.inv(Q)

print('QΛQinv=')
print(QΛQinv)

Below is the output:

A=
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

QΛQinv=
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

6 Orthogonal Matrix

As mentioned before, when matrix A is symmetric, eigenvectors in matrix Q are orthogonal to each other.

When we also make the L2 norm of all eigenvectors 1 (orthonormal), we call Q an orthogonal matrix.

When Q is an orthogonal matrix,

$Q Q^{⊤} = I$

As such,

$Q^{⊤} = Q^{- 1}$

Therefore,

$A = Q Λ Q^{- 1} = Q Λ Q^{⊤}$

The above fact will be helpful when we discuss the singular value decomposition.

Let’s do QΛQ^T with Numpy:

import numpy as np

# Asymmetric Matrix
A = np.array([
    [1, 2, 3],
    [2, 5, 6],
    [3, 6, 9]], dtype=np.float64)

print('A=')
print(A)
print()

# Eigenvalue and Eigenvector
values, vectors = np.linalg.eig(A)

Λ = np.diag(values)
Q = vectors

QΛQT = Q @ Λ @ Q.T

print('QΛQT=')
print(QΛQT)

Below is the output:

A=
[[1. 2. 3.]
 [2. 5. 6.]
 [3. 6. 9.]]

QΛQT=
[[1. 2. 3.]
 [2. 5. 6.]
 [3. 6. 9.]]

It’s fun to see the theory works precisely as expected.

Also, let’s check Q is an orthogonal matrix:

QQT = Q @ Q.T

print('QQT=')
print(QQT)

Below is the output:

QQT=
[[ 1.00000000e+00 5.55111512e-17 -1.80411242e-16]
 [ 5.55111512e-17 1.00000000e+00 0.00000000e+00]
 [-1.80411242e-16 0.00000000e+00 1.00000000e+00]]

Here we see the limitation of numerical computing. Non-diagonal elements should be 0; instead, some are very small values.

So far, we are dealing with square matrices as Eigen Decomposition works only with square matrices. Next, we’ll see SVD that works with non-square matrices.

7 Singular Value Decomposition

SVD exists for any matrix, even for non-square matrices.

Suppose matrix A is m×n (m ≠ n), we can still decompose matrix A as follows:

$A = U Σ V^{⊤}$

$U$ is an orthogonal matrix (m×m).
$Σ$ is a diagonal matrix (m×n).
$V$ is an orthogonal matrix (n×n).

It looks too abstract — a visualization may help:

$Σ$ has the following structure. The top-left part is a diagonal matrix with singular values (more on this later) in the diagonal elements. Other elements of $Σ$ are all zeros.

As a convention, we sort the singular values in descending order:

$σ_{1} \geq σ_{2} \geq \dots \geq σ_{s} \geq 0$

The number of singular values is the same as the rank of matrix A, which is the number of independent column vectors.

So, how can we create such a decomposition?

The main idea is that we make a square matrix like the one below:

$A A^{⊤}$

$A^{⊤} A$

We can choose either, but using the one with fewer elements is easier.

For example, if matrix A is 100×10,000. $A A^{T}$ is 100×100, and $A^{T} A$ is 10,000×10,000. So, I would choose $A A^{T}$ . How about you?

Suppose we perform SVD on matrix A using $A A^{T}$ :

$\begin{aligned} A A^{⊤} & = U Σ V^{⊤} (U Σ V^{⊤})^{⊤} \\ = U Σ V^{⊤} V Σ^{⊤} U^{⊤} V is an orthogonal matrix \\ = U Σ Σ^{⊤} U^{⊤} \\ = U Σ_{m}^{2} U^{⊤} \end{aligned}$

where

$Σ_{m}^{2} = Σ Σ^{⊤}$

is a diagonal matrix (m×m) with squared sigmas as the top-left diagonal elements. These values are eigenvalues of AA^T since we have eigen-decomposed $A A^{T}$ as follows:

$A A^{⊤} = U Σ_{m}^{2} U^{⊤}$

The column vectors in matrix U are eigenvectors because:

$A A^{⊤} U = U Σ_{m}^{2} U is an orthogonal matrix$

In other words, if we can eigen-decompose $A A^{T}$ , we can calculate U and diagonal sigmas, and then we can calculate V.

We can consider the same calculation with $A^{T} A$ :

$\begin{aligned} A^{⊤} A & = (U Σ V^{⊤})^{⊤} U Σ V^{⊤} \\ = V Σ^{⊤} U^{⊤} U Σ V^{⊤} U is an orthogonal matrix \\ = V Σ^{⊤} Σ V^{⊤} \\ = V Σ_{n}^{2} V^{⊤} \end{aligned}$

where

$Σ_{n}^{2} = Σ^{⊤} Σ$

is a diagonal matrix (n×n) with squared sigmas as the top-left diagonal elements. These values are eigenvalues since we eigen-decompose $A^{T} A$ as follows:

$A^{⊤} A = V Σ_{n}^{2} V^{⊤}$

The column vectors in matrix V are eigenvectors because:

$A^{⊤} A V = V Σ_{n}^{2}$

In other words, if we can eigen-decompose A^TA, we can calculate V and diagonal sigmas, and then we can calculate U.

So, if we eigen-decompose either $A A^{T}$ or $A^{T} A$ , we can perform singular value decomposition.

Again, we should choose a smaller one of $A A^{T}$ or $A^{T} A$ to make our life easier.

Let’s try SVD with Numpy:

import numpy as np

A = np.array([
    [1, 2],
    [3, 4],
    [5, 6]], dtype=np.float64)

print('A=')
print(A)
print()

U, S, VT = np.linalg.svd(A)

print('U=')
print(U)
print()

print('S=')
print(S)
print()

print('VT=')
print(VT)
print()

Below is the output:

A=
[[1. 2.]
 [3. 4.]
 [5. 6.]]

U=
[[-0.2298477 0.88346102 0.40824829]
 [-0.52474482 0.24078249 -0.81649658]
 [-0.81964194 -0.40189603 0.40824829]]

S=
[9.52551809 0.51430058]

VT=
[[-0.61962948 -0.78489445]
 [-0.78489445 0.61962948]]

Let’s set up matrix Σ as follows:

Σ = np.zeros_like(A, dtype=np.float64)
Σ[:2] = np.diag(S)

print('Σ=')
print(Σ)
print()

Below is the output:

Σ=
[[9.52551809 0. ]
 [0. 0.51430058]
 [0. 0. ]]

Now, we can confirm $A = U Σ V^{⊤}$ .

print('UΣVT=')
print(U @ Σ @ VT)
print()

Below is the output:

UΣVT=
[[1. 2.]
 [3. 4.]
 [5. 6.]]

Now that we know how SVD works, we are ready to work on pseudo-inverse matrices.

8 Pseudo-inverse Matrix

An inverse matrix does not always exist, even for a square matrix. However, a pseudo-inverse — also called Moore Penrose inverse — matrix exists for non-square matrices.

For example, matrix A is m×n.

$A x = y$

Using a pseudo-inverse matrix $A^{+}$ , we can perform the following conversion:

$x = A^{+} y$

We define a pseudo-inverse matrix $A^{+}$ as:

$A + = V D^{+} U^{⊤}$

V and U are from SVD:

$A = U Σ V^{⊤}$

We make $D^{+}$ by transposing $Σ$ and inverse all the diagonal elements. Suppose $Σ$ is defined as follows:

$Σ = [\begin{matrix} σ_{1} \\ σ_{2} \\ ⋱ \\ σ_{s} \\ 0 \\ ⋱ \\ 0 \end{matrix}]$

Then $D^{+}$ is defined as follows:

$D^{+} = [\begin{matrix} \frac{1}{σ_{1}} \\ \frac{1}{σ_{2}} \\ ⋱ \\ \frac{1}{σ_{s}} \\ 0 \\ ⋱ \\ 0 \end{matrix}]$

Now, we can see how $A^{+} A$ works:

$\begin{aligned} A^{+} A & = (V D^{+} U^{⊤}) (U Σ V^{⊤}) \\ = V D^{+} Σ V^{⊤} Note: D^{+} Σ = I \\ = V V^{⊤} \\ = I \end{aligned}$

In the same way, $A A^{+} = I$ .

In summary, if we can perform SVD on matrix A, we can calculate A^+ by VD^+UT, which is a pseudo-inverse matrix of A.

Let’s try Pseudo-inverse with Numpy:

import numpy as np

A = np.array([
    [1, 2],
    [3, 4],
    [5, 6]], dtype=np.float64)

AP = np.linalg.pinv(A)

print(‘AP @ A’)
print(AP @ A)

The below is the output:

AP @ A
[[ 1.0000000000000004e+00 0.0000000000000000e+00]
 [-4.4408920985006262e-16 9.9999999999999956e-01]]

Congratulations! You’ve made it this far. I hope you’ve enjoyed it!

9 References

Eigendecomposition of a matrix
Wikipedia
Singular Value Decomposition
Wikipedia
Moore Penrose inverse
Wikipedia
Up-sampling with Transposed Convolution