Matrix Decomposition Demystified

Eigen Decomposition, SVD, and Pseudo-inverse Matrix Made Easy

Fundamentals
Published

August 31, 2021

Matrix Decomposition Demystified: Eigen Decomposition, SVD, and Pseudo-inverse Matrix Made Easy

1 Matrix Decomposition

In this article, we discuss the following Matrix Decomposition topics:

  • Square Matrix
  • Eigenvalue and Eigenvector
  • Symmetric Matrix
  • Eigen Decomposition
  • Orthogonal Matrix
  • Singular Value Decomposition
  • PSeudo-inverse Matrix

2 Square Matrix

Eigen Decomposition works only with square matrices.

As a quick reminder, let’s look at a square matrix. In square matrices, the number of rows and columns are the same.

For example,

A=[4132]

That was easy. Let’s continue with Eigenvalue and Eigenvector concepts.

3 Eigenvalue and Eigenvector

When a square matrix A and a vector x have the following relationship:

Ax=λx

, we call λ eigenvalue and the vector x eigenvector. λ is a scalar.

The above means that multiplying vector x with matrix A produces the same result as multiplying vector x with scalar value λ.

Let’s move everything to the left side of the equation to find the condition for having eigenvalues and eigenvectors.

(AλI)x=0

For vector x to be non-zero, no inverse matrix of (AλI) should exist. In other words, $(A — I) $’s determinant must be zero.

det(AλI)=0

Let’s use the 2x2 square matrix as an example to calculate the determinant:

A=[4132]

So, (AλI) is:

AλI=[4λ132λ]

And we calculate the determinant (and we want it to be zero):

det(AλI)=|4λ132λ|=(4λ)(2λ)3=86λ+λ23=λ26λ+5=(λ1)(λ5)=0

Therefore, λ is either 1 or 5.

Let’s define the eigenvector as follows:

x=[x1x2]

As a reminder, we have the following equation for eigenvalue and eigenvector:

(AλI)=0

We can calculate the eigenvector x for λ=1 as follows:

A=[411321][x1x2]=[00]3x1+x2=0

To satisfy the above conditions, x1 and x2 are:

x1=13tx2=t

Although t can be any value, we often make the L2 norm to be a size of 1 since, otherwise, we have an infinite number of possible solutions:

x12+x22=1

The eigenvector for the eigenvalue λ=1 is:

x=[110310]

or

x=[110310]

They are the same, except that one vector direction is the opposite. So, I’ll choose the first one as the eigenvector for λ=1.

Let’s make sure this works as intended:

Ax=[4132][110310]=[4110+13103110+2310]=[110310]=1x

We can solve for λ=5 using the same process:

x=[1sqrt21sqrt2]

For 3×3 or larger matrices, it becomes too tedious to calculate all that manually. We can write a Python script with NumPy to do the same:

import numpy as np

A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]])

print('A=')
print(A)
print()

values, vectors = np.linalg.eig(A)

print('eigenvalues')
print(values)
print()

print('eigenvectors')
print(vectors)

Below is the output:

A=
[[1 2 3]
 [4 5 6]
 [7 8 9]]

eigenvalues
[ 1.61168440e+01 -1.11684397e+00 -1.30367773e-15]

eigenvectors
[[-0.23197069 -0.78583024 0.40824829]
 [-0.52532209 -0.08675134 -0.81649658]
 [-0.8186735 0.61232756 0.40824829]]

We can confirm they work as expected:

λ = values[0]
x = vectors[:, 0]

print('λ * x')
print* x)
print()
print('A @ x')
print(A @ x)

Below is the output:

λ * x
[ -3.73863537 -8.46653421 -13.19443305]

A @ x
[ -3.73863537 -8.46653421 -13.19443305]

This works fine but if you want to keep the vectors in column format, you can do the following:

λ = values[0]
x = vectors[:, 0:1]

print('λ * x')
print* x)
print()
print('A @ x')
print(A @ x)

Below is the output:

λ * x
[[ -3.73863537]
 [ -8.46653421]
 [-13.19443305]]

A @ x
[[ -3.73863537]
 [ -8.46653421]
 [-13.19443305]]

4 Symmetric Matrix

Eigenvectors of a symmetric matrix are orthogonal to each other. This fact will become helpful later on. So, let’s prove it here.

Suppose λ1 and λ2 (λ1λ2) are eigenvalues, and x1 and x2 are the corresponding eigenvectors:

λ1x1x2=(λ1x1)x2=(Ax1)x2=x1Ax2=x1Ax2A is symmetric=x1λ2x2=λ2x1x2=λ2x1x2

Therefore,

λ1x1x2=λ2x1x2

However, λ1λ2. As such, the following must be true:

x1x2=0

Therefore, eigenvectors of a symmetric matrix are orthogonal to each other.

Now we are ready to discuss how Eigen Decomposition works.

5 Eigen Decomposition

Using eigenvalues and eigenvectors, we can decompose a square matrix A as follows:

A=QΛQ1

Q is a matrix that has eigenvectors in the columns.

Q=[x1 x2  xn]

Λ (uppercase lambda) is a diagonal matrix that has eigenvalues as the diagonal elements:

Λ=[λ1000λ2000λn]

We put eigenvalues in descending order to make the diagonal matrix Λ unique:

λ1>λ2>>λn

To prove that such Eigen Decomposition is possible, we adjust the equation slightly:

A=QΛQ1AQ=QΛ

And we will show that AQ=QΛ is true.

AQ=A[x1 x2  xn]=[Ax1 Ax2  Axn]=[λ1x1 λ2x2  λnxn]=[x1 x2  xn][λ1000λ2000λn]=QΛ

In other words, AQ is equivalent to each eigenvector inside matrix Q multiplied by the corresponding eigenvalue.

So, we have proved that Eigen Decomposition is possible.

Let’s try Eigen Decomposition with Numpy:

import numpy as np

A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]], dtype=np.float64)

print('A=')
print(A)
print()

# Eigenvalue and Eigenvector
values, vectors = np.linalg.eig(A)

Λ = np.diag(values)
Q = vectors

QΛQinv = Q @ Λ @ np.linalg.inv(Q)

print('QΛQinv=')
print(QΛQinv)

Below is the output:

A=
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

QΛQinv=
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

6 Orthogonal Matrix

As mentioned before, when matrix A is symmetric, eigenvectors in matrix Q are orthogonal to each other.

When we also make the L2 norm of all eigenvectors 1 (orthonormal), we call Q an orthogonal matrix.

When Q is an orthogonal matrix,

QQ=I

As such,

Q=Q1

Therefore,

A=QΛQ1=QΛQ

The above fact will be helpful when we discuss the singular value decomposition.

Let’s do QΛQ^T with Numpy:

import numpy as np

# Asymmetric Matrix
A = np.array([
    [1, 2, 3],
    [2, 5, 6],
    [3, 6, 9]], dtype=np.float64)

print('A=')
print(A)
print()

# Eigenvalue and Eigenvector
values, vectors = np.linalg.eig(A)

Λ = np.diag(values)
Q = vectors

QΛQT = Q @ Λ @ Q.T

print('QΛQT=')
print(QΛQT)

Below is the output:

A=
[[1. 2. 3.]
 [2. 5. 6.]
 [3. 6. 9.]]

QΛQT=
[[1. 2. 3.]
 [2. 5. 6.]
 [3. 6. 9.]]

It’s fun to see the theory works precisely as expected.

Also, let’s check Q is an orthogonal matrix:

QQT = Q @ Q.T

print('QQT=')
print(QQT)

Below is the output:

QQT=
[[ 1.00000000e+00 5.55111512e-17 -1.80411242e-16]
 [ 5.55111512e-17 1.00000000e+00 0.00000000e+00]
 [-1.80411242e-16 0.00000000e+00 1.00000000e+00]]

Here we see the limitation of numerical computing. Non-diagonal elements should be 0; instead, some are very small values.

So far, we are dealing with square matrices as Eigen Decomposition works only with square matrices. Next, we’ll see SVD that works with non-square matrices.

7 Singular Value Decomposition

SVD exists for any matrix, even for non-square matrices.

Suppose matrix A is m×n (m ≠ n), we can still decompose matrix A as follows:

A=UΣV

  • U is an orthogonal matrix (m×m).
  • Σ is a diagonal matrix (m×n).
  • V is an orthogonal matrix (n×n).

It looks too abstract — a visualization may help:

Σ has the following structure. The top-left part is a diagonal matrix with singular values (more on this later) in the diagonal elements. Other elements of Σ are all zeros.

As a convention, we sort the singular values in descending order:

σ1  σ2    σs  0

The number of singular values is the same as the rank of matrix A, which is the number of independent column vectors.

So, how can we create such a decomposition?

The main idea is that we make a square matrix like the one below:

AA

or

AA

We can choose either, but using the one with fewer elements is easier.

For example, if matrix A is 100×10,000. AAT is 100×100, and ATA is 10,000×10,000. So, I would choose AAT. How about you?

Suppose we perform SVD on matrix A using AAT:

AA=UΣV(UΣV)=UΣVVΣUV is an orthogonal matrix=UΣΣU=UΣm2U

where

Σm2=ΣΣ

is a diagonal matrix (m×m) with squared sigmas as the top-left diagonal elements. These values are eigenvalues of AA^T since we have eigen-decomposed AAT as follows:

AA=UΣm2U

The column vectors in matrix U are eigenvectors because:

AAU=UΣm2U is an orthogonal matrix

In other words, if we can eigen-decompose AAT, we can calculate U and diagonal sigmas, and then we can calculate V.

We can consider the same calculation with ATA:

AA=(UΣV)UΣV=VΣUUΣVU is an orthogonal matrix=VΣΣV=VΣn2V

where

Σn2=ΣΣ

is a diagonal matrix (n×n) with squared sigmas as the top-left diagonal elements. These values are eigenvalues since we eigen-decompose ATA as follows:

AA=VΣn2V

The column vectors in matrix V are eigenvectors because:

AAV=VΣn2

In other words, if we can eigen-decompose A^TA, we can calculate V and diagonal sigmas, and then we can calculate U.

So, if we eigen-decompose either AAT or ATA, we can perform singular value decomposition.

Again, we should choose a smaller one of AAT or ATA to make our life easier.

Let’s try SVD with Numpy:

import numpy as np

A = np.array([
    [1, 2],
    [3, 4],
    [5, 6]], dtype=np.float64)

print('A=')
print(A)
print()

U, S, VT = np.linalg.svd(A)

print('U=')
print(U)
print()

print('S=')
print(S)
print()

print('VT=')
print(VT)
print()

Below is the output:

A=
[[1. 2.]
 [3. 4.]
 [5. 6.]]

U=
[[-0.2298477 0.88346102 0.40824829]
 [-0.52474482 0.24078249 -0.81649658]
 [-0.81964194 -0.40189603 0.40824829]]

S=
[9.52551809 0.51430058]

VT=
[[-0.61962948 -0.78489445]
 [-0.78489445 0.61962948]]

Let’s set up matrix Σ as follows:

Σ = np.zeros_like(A, dtype=np.float64)
Σ[:2] = np.diag(S)

print('Σ=')
print(Σ)
print()

Below is the output:

Σ=
[[9.52551809 0. ]
 [0. 0.51430058]
 [0. 0. ]]

Now, we can confirm A=UΣV.

print('UΣVT=')
print(U @ Σ @ VT)
print()

Below is the output:

UΣVT=
[[1. 2.]
 [3. 4.]
 [5. 6.]]

Now that we know how SVD works, we are ready to work on pseudo-inverse matrices.

8 Pseudo-inverse Matrix

An inverse matrix does not always exist, even for a square matrix. However, a pseudo-inverse — also called Moore Penrose inverse — matrix exists for non-square matrices.

For example, matrix A is m×n.

Ax=y

Using a pseudo-inverse matrix A+, we can perform the following conversion:

x=A+y

We define a pseudo-inverse matrix A+ as:

A+=VD+U

V and U are from SVD:

A=UΣV

We make D+ by transposing Σ and inverse all the diagonal elements. Suppose Σ is defined as follows:

Σ=[σ1σ2σs00]

Then D+ is defined as follows:

D+=[1σ11σ21σs00]

Now, we can see how A+A works:

A+A=(VD+U)(UΣV)=VD+ΣVNote: D+Σ=I=VV=I

In the same way, AA+=I.

In summary, if we can perform SVD on matrix A, we can calculate A^+ by VD^+UT, which is a pseudo-inverse matrix of A.

Let’s try Pseudo-inverse with Numpy:

import numpy as np

A = np.array([
    [1, 2],
    [3, 4],
    [5, 6]], dtype=np.float64)

AP = np.linalg.pinv(A)

print(‘AP @ A’)
print(AP @ A)

The below is the output:

AP @ A
[[ 1.0000000000000004e+00 0.0000000000000000e+00]
 [-4.4408920985006262e-16 9.9999999999999956e-01]]

Congratulations! You’ve made it this far. I hope you’ve enjoyed it!

9 References