Detailed Explanation of QR Decomposition by Classical & Modified Gram-Schmidt Method

12 min readJan 19, 2023

Image from https://m.blog.naver.com/qio910/221805507053

Episode 1: QR Decomposition by Classical Gram-Schmidt (CGS)and modified Gram-Schmidt (MGS)

Episode 2: QR Decomposition by Householder Transformation

Episode 3: QR Decomposition by Givens Rotation

In the world of data science, there are some basic and important concepts such as the Least Square Problem. QR decomposition could be useful for solving the Least Square Problem. Other than the data science field, QR decomposition is used in the QR algorithm to find out the eigenvector and eigenvalues of a matrix, which is another important tool in Linear Algebra.

In this episode (episode 1), we will look into the Householder Transformation method when we need to perform QR decomposition.

Now, we are going to explain what QR decomposition is trying to do. If you are not familiar with QR decomposition, these are some key points that you need to know before we start.

“Q” in QR decomposition represents an orthogonal matrix, which means:

Remarks: for an orthogonal matrix, the dot product of any 2 column vectors in orthogonal matrix must be equal to 0

“R” in QR decomposition represents right (upper) triangular matrix, which means:

Image by Author. Matrix R: Right (Upper) Triangular Matrix

Therefore, QR decomposition is sometimes called QU decomposition.

Some Characteristics of QR Decomposition

QR Decomposition always exists, but may not be unique
If the determinant of matrix A is not 0 and all diagonal entries of R > 0, then this QR decomposition is unique.
There are two types of QR decomposition:
a. Full QR Decomposition
b. Reduced QR Decomposition

For simplicity, we will only include full QR decomposition for square matrix here, whereas matrix A is full rank (i.e. All columns are linearly independent).
QR decomposition is also applicable for m x n (m-by-n) matrixes, where m ≥ n.
In that case, Q will be a m x m (m-by-m) matrix and R will be m x n (m-by-n) matrix.

4 Methods to perform QR decomposition

In this article, we will only cover the Gram-Schmidt Process and the Modified Gram-Schmidt Process.

Set-Up

This time, we will go through the steps of QR decomposition and the implementation of QR decomposition with Python code.

First, we need to create a Python class called Matrix, which will be used in our example. For the entire Python code, you may refer to GitHub. In this Python class, we include some major components and functions such as the determinant of a matrix, printing out the matrix, transpose, etc. Of course, you can also use the Matrix Library provided by Numpy.

import numpy as np
import copy

# Set up of Matrix Class
class Matrix:
    values = []
    determinant = None

    def __init__(self, n_row, n_col, all_zero=True, two_d_array=None):
        self.values = []
        self.n_row = n_row
        self.n_col = n_col
        if two_d_array is not None:
            assert type(two_d_array) == list and type(two_d_array[0]) == list
            n_row = max(len(two_d_array), n_row)
            n_col = max(max([len(row) for row in two_d_array]), n_col)
            for i in range(n_row):
                row_val = []
                for j in range(n_col):

                    try:
                        row_val.append(float(two_d_array[i][j]))
                    except:
                        row_val.append(0)
                self.values.append(row_val)
        elif all_zero:
            for i in range(n_col):
                row_val = []
                for j in range(n_row):
                    row_val.append(0.0)
                self.values.append(row_val)

        self.calculate_determinant(self.values)

    def set_diagonal(self, values=1):
        assert self.n_col == self.n_row, "set_diagonal only available to square matrix"
        for i in range(self.n_col):
            self.values[i][i] = values

    @staticmethod
    # Return the cofactor for calculating the determinant
    def sub_matrix(two_d_array, remove_row, remove_col):
        new_matrix = []
        for i in range(len(two_d_array)):
            if remove_row == i:
                continue
            new_row = []
            for j in range(len(two_d_array[i])):
                if remove_col == j:
                    continue
                new_row.append(two_d_array[i][j])
            new_matrix.append(new_row)
        return new_matrix

    def calculate_determinant(self, two_d_array):
        n_row = len(two_d_array)
        n_col = len(two_d_array[0])
        assert n_row == n_col, "calculate_determinant only available to square matrix"

        if n_row == 1:
            return np.sum(two_d_array)

        sum = 0
        for i in range(n_row):
            if i % 2 == 0:
                factor = 1
            else:
                factor = -1
            first_num_in_row = two_d_array[i][0]
            sub_matrix = self.sub_matrix(two_d_array, i, 0)
            sum += factor * first_num_in_row * self.calculate_determinant(sub_matrix)

        self.determinant = sum
        return sum

    def transpose(self):
        new_values = []
        for i in range(self.n_col):
            new_row = []
            for j in range(self.n_row):
                new_row.append(self.values[j][i])
            new_values.append(new_row)
        self.values = new_values

    def print_shape(self):
        print(f"Matrix Shape: ({self.n_col},{self.n_row})")

    def print_determinant(self):
        print(f"Determinant: {self.determinant}")

    def print_values(self, matrix_name="", num_digits=4):
        if len(matrix_name) > 0:
            print(f"{matrix_name}:")
        for row in self.values:
            for row_val in row:
                print(f"{round(row_val,num_digits)}\t", end="")
            print("")
        print("")

    # Get the projection of vector on another vector
    def proj_on(self, vector, on_vector):
        assert len(vector) == len(on_vector), f"Vectors must be same length"
        dot_prod = self.dot_product(vector, on_vector)
        on_vector_length = self.length(on_vector) ** 2
        res = (dot_prod / on_vector_length) * np.array(on_vector)
        return res

    @staticmethod
    # Get the length of a vector
    def length(vector):
        return np.sqrt(np.sum([i**2 for i in vector]))

    @staticmethod
    # Dot Product function
    def dot_product(vector_1, vector_2):
        return np.sum([i * j for i, j in zip(vector_1, vector_2)])

    def multiply(self, new_M):
        assert type(new_M) == Matrix
        assert self.n_col == new_M.n_row
        res_M = Matrix(self.n_row, new_M.n_col)

        for i in range(self.n_row):
            for j in range(new_M.n_col):
                left_vector = self.values[i]
                right_vector = [row[j] for row in new_M.values]
                res_M.values[i][j] = self.dot_product(left_vector, right_vector)
        return res_M

First, we create a 3-by-3 Matrix A for our QR decomposition example.

# main.py
# Create the matrix A
A = Matrix(
    n_row=3,
    n_col=3,
    two_d_array=[[2, -1, -2], [-4, 6, 3], [-4, -2, 8]],
)

Projection and Rejection of Vector

Before going into the steps of the Gram-Schmidt Process, it is better to explain the major concept of the Gram-Schmidt Process, which is the projection of vector. If you know vector projection and vector rejection well, you can skip this part and go to QR Decomposition by the Gram-Schmidt Process.

In the following example, we will introduce two concepts: Vector Projection and Vector Rejection, with the example of two vectors, a and u.

Vector Projection

The projection of a vector a on another vector b (plane) is its orthogonal projection on vector b (plane).

Vector Rejection

The rejection of vector a from another vector b (plane) is its orthogonal projection on a straight line (plane) which is orthogonal to vector b (plane).

In our example above, we first construct another vector w which is orthogonal to u.

Image by Author. Explanation of Vector Rejection

Decomposing the vector

Image by Author. Vector Projection and Vector Rejection

With vector projection and vector rejection, we are actually decomposing a vector a into two vectors which are parallel to w and u respectively.

As you can see, vector projection and vector rejection are orthogonal to each other, so the dot product of these two vectors will be equal to 0.

QR Decomposition by Gram-Schmidt Process

What the Gram-Schmidt Process does is to decompose each column vector in A into a linear combination of all orthogonal vectors in Q whereas Q contains the direction of all orthogonal vectors while R contains the linear combination of each orthogonal column vector so that we can construct the matrix A (i.e. A = QR).

So, how to get matrix Q?

First, use the first column of A (a_1) as the first orthogonal vector (u_1).
As magnitude in orthogonal matrix Q is not important, we will convert the orthogonal vector into a unit vector, which means we will divide the vector by its magnitude.

After calculating the first orthogonal unit vector, then we go to the next column in A.

In the second column vector in A, we need to remove the projection of the second column vector (a_2) on the first orthogonal vector (u_1). As introduced above, the remaining vector (i.e. vector rejection) would be the second orthogonal vector. We can obtain the second column (q_2) in Q by normalizing the orthogonal vector (u_2).

Image by Author. Second Iteration in CGS

Now, we move to the last column in Q.

After getting all orthogonal vectors, we have the matrix Q now.

What about matrix R?

Since we already have matrix Q, we can use A = QR and the property of the orthogonal matrix.

Then we can calculate matrix R by multiplying the transpose of Q and A. Now let’s see how we can implement the Python code for the Gram-Schmidt Process.

Now, we have matrix R.

[Step-by-Step Python Code] QR Decomposition by Classical Gram-Schmidt (CGS) Process:

Let’s move to the implementation of the Classical Gram-Schmidt Process with Python.

# QR Decomposition by Gram-Schmidt Process
def QR_GS(M: Matrix) -> tuple():
    Q_vectors = []
    for i in range(M.n_col):
        # Get the column vector
        curr_column = [row[i] for row in M.values]

        # Removing Projection of current vector on computed orthogonal vector
        q = curr_column
        if i > 0:
            for k in range(i):
                q -= M.proj_on(curr_column, Q_vectors[k])
        # Normalise the column vector
        q = q / M.length(q)

        # Store the vector in the row, need to do transpose after completing all iteration
        Q_vectors.append(list(q))

    # Create a new matrix for the all Q_vectors
    Q = Matrix(n_col=M.n_col, n_row=M.n_row, two_d_array=Q_vectors)

    # As we perform the iteration column-by-column, we need to do transpose to get the correct Q matrix
    Q.transpose()

    # --- Here we have the correct Q Matrix ---

    # Calculate R by tranpose of Q times A
    Q.transpose()
    R = Q.multiply(M)

    # Convert back to the correct Q
    Q.transpose()

    return (Q, R)

# main.py
# ----- Example of Gram Schmdit Process -----
print("----- Gram-Schmidt Process -----")
A.print_values(matrix_name="A")
Q, R = QR_GS(A)

Q.print_values(matrix_name="Q", num_digits=4)

R.print_values(matrix_name="R", num_digits=4)

# Validate the answer
print("----- Separator -----")
new_A = Q.multiply(R)
new_A.print_values(matrix_name="new_A", num_digits=4)

The Numpy package also provides functions for QR decomposition as well. You can directly use it if you want. You may find their documentation here.

To recap on what the Gram-Schmidt Process is actually doing, we can see that each column vector in A is the linear combination of the column vectors in Q.

Image by Author. Column vector in A is the linear combination of column vector in orthogonal matrix Q — First Column

Image by Author. Column vector in A is the linear combination of column vector in orthogonal matrix Q — Second Column

Image by Author. Column vector in A is the linear combination of column vector in orthogonal matrix Q — Third Column

You can check the third column yourself.

QR Decomposition by Modified Gram-Schmidt Process

Why is there a modified version of the Gram-Schmidt Process?

This is because the classical Gram-Schmidt Process is not numerically stable for computation.

What is numerical instability?

“In the mathematical subfield of numerical analysis, numerical stability is a generally desirable property of numerical algorithms. The precise definition of stability depends on the context. One is numerical linear algebra and the other is algorithms for solving ordinary and partial differential equations by discrete approximation.” — Wikipedia

During the iteration of a numerical algorithm, errors (i.e. rounding error or error between the actual value and approximated value) may exist in some steps and these errors may be carried forward to the next iteration and so on. This problem will magnify the errors when the number of iterations gets larger.

Why is Gram-Schmidt Process numerically unstable?

As we can see from the above example, Gram-Schmidt Process is a numerical algorithm that helps us to decompose the matrix. In a real-life use case, numbers in the matrix may not be an integer and the result in each iteration could be a number with many decimal places or even an irrational number. In that case, a round-off error or precision error will exist.

The orthogonal vector (u) in each iteration could encounter a round-off error and this error will be brought to the next iteration because u_x is used to calculate u_(x+1) & u_x and u_(x+1) is used to calculate u_(x+2) & so on. So, the tiny rounding-off error in step 1 may be exaggerated and affect the result in step 100.

Therefore, Gram-Schmidt Process is numerically unstable.

Can we solve this problem?

No. We cannot solve it entirely, but we can try to improve it.

Modified Gram-Schmidt (MGS) Process is a modified version of Gram-Schmidt Process that tries to improve the problem of numerical instability.

The problem of classical Gram-Schmidt (CGS) Process is that the orthogonal vector u_x will affect other orthogonal vectors u_(x+1) and u_(x+2) and so on.

The solution of Modified Gram-Schmidt is to remove the vector projection of all columns on the right (i.e. u_(x+1), u_(x+2), …) on u_x at the x-th iteration.

In other words,
In iteration 1: u_2, u_3, … u_n will remove the their vector projection on u_1
In iteration 2: u_3, u_4, … u_n will remove the their vector projection on u_2
In iteration 3: u_4, u_5, … u_n will remove the their vector projection on u_3
From the u_3 point of view, it has removed its vector projection on u_1 and u_2.
From the u_4 point of view, it has removed its vector projection on u_1, u_2, and u_3.
and so on…

The following is one of the explanations of why MGS is better than CGS. You may find the entire document here.

Image by Author. Link: https://www.math.uci.edu/~ttrogdon/105A/html/Lecture23.html

How to get Matrix Q?

Let’s try out MGS with the same example. We first normalize the selected column vector from A, which is a_1. Then we remove the projection vector of other columns on the right of a_1 on the normalized a_1 vector, which is q_1 in the graph below.

With the updated a_2 and a_3, repeat the steps for the second column.

Image by Author. Second Iteration in MGS

With the updated a_3, repeat the steps for third column.

After getting all orthogonal vectors, we have the matrix Q now, which is exactly the same as the result in Classical Gram-Schmidt Process.

What about matrix R?

For matrix R, it is the same. You can obtain matrix R by A= QR and the property of the orthogonal matrix.

[Step-by-Step Python Code] QR Decomposition by Modified Gram-Schmidt (MGS) Process:

Let’s move to the implementation of the Modified Gram-Schmidt Process with Python.

# QR Decomposition by Modified Gram-Schmidt Process
def QR_MGS(M: Matrix) -> tuple():

    # Get all column vectors in M
    M_column_vector = []
    for i in range(M.n_col):
        M_column_vector.append([row[i] for row in M.values])

    # Start iteration
    Q_vectors = []
    for i in range(M.n_col):
        # Get the current column vector
        q = M_column_vector[i]

        # Normalise the column vector
        q = q / M.length(q)

        # Store the vector in the row, need to do transpose when completed all iteration
        Q_vectors.append(list(q))

        # Removing Projection of current vector on calculated orthogonal vector
        for j in range(i, M.n_col):
            M_column_vector[j] -= M.proj_on(M_column_vector[j], q)

    # Create a new matrix of the Q_vectors
    Q = Matrix(n_col=M.n_col, n_row=M.n_row, two_d_array=Q_vectors)

    # As we perform the iteration column-by-column, we need to do transpose to get the correct Q matrix
    Q.transpose()

    # --- Here we have the correct Q Matrix ---

    # Calculate R by tranpose of Q times A
    Q.transpose()
    R = Q.multiply(M)

    # Convert back to the correct Q
    Q.transpose()

    return (Q, R)

# main.py
# ----- Example of Modified Gram Schmdit Process -----
print("----- Modified Gram-Schmidt Process -----")
A.print_values(matrix_name="A")
Q, R = QR_MGS(A)

Q.print_values(matrix_name="Q", num_digits=4)

R.print_values(matrix_name="R", num_digits=4)

print("----- Validate Result -----")
new_A = Q.multiply(R)
new_A.print_values(matrix_name="new_A", num_digits=4)

Done! We successfully implemented both Classical Gram-Schmidt and Modified Gram-Schmidt Processes for QR decomposition. I hope you understood more about QR decomposition by the CGS / MGS method.

The python code of the above example can be found on GitHub.

If you like this article, give me a like and share it with your friends. You may also leave me a comment/message too.

Detailed Explanation of QR Decomposition by Classical & Modified Gram-Schmidt Method

Some Characteristics of QR Decomposition

4 Methods to perform QR decomposition

Set-Up

Projection and Rejection of Vector

Vector Projection

Vector Rejection

Decomposing the vector

QR Decomposition by Gram-Schmidt Process

So, how to get matrix Q?

What about matrix R?

[Step-by-Step Python Code] QR Decomposition by Classical Gram-Schmidt (CGS) Process:

QR Decomposition by Modified Gram-Schmidt Process

Why is there a modified version of the Gram-Schmidt Process?

What is numerical instability?

Why is Gram-Schmidt Process numerically unstable?

Can we solve this problem?

How to get Matrix Q?

What about matrix R?

[Step-by-Step Python Code] QR Decomposition by Modified Gram-Schmidt (MGS) Process:

The python code of the above example can be found on GitHub.

Written by Anthony Kwok