Skip to content

Basics of Linear Algebra

Linearity is the simplest structure in mathematics. Let's review the basic notations and terminologies in linear algebra.

A is an m×n real matrix, written ARm×n, if

A=[a11a12a1na21a22a2nam1am2amn]

where aijR. The (i,j)th entry of A is Aij=aij.

The transpose of ARm×n is defined as

AT=[A11A21Am1A12A22Am2A1nA2nAmn]Rn×m

In other words, (AT)ij=Aji.

Note: xRn is considered to be a column vector in Rn×1.

Definition(Sums and products of matrices). The sum of matrices ARm×n and BRm×n is the matrix A+BRm×n such that

(A+B)ij=Aij+Bij

The product of matrices ARm×n and BRn× is the matrix ABRm× such that

(AB)ij=k=1nAikBkj

The n×n identity matrix, denoted In or I for short, is

I=In=[100010001]Rn×n

We have IA=A=AI.

If it exists, the inverse of A, denoted A1, is a matrix such that A1A=I and AA1=I. If A1 exists, we say that A is invertible. We have (A1)T=(AT)1 and (AB)1=B1A1.

The trace of a square matrix ARn×n, denoted tr(A), is defined as tr(A)=i=1nAii. We have tr(AB)=tr(BA) if AB is a square matrix.

We say A is symmetric if A=AT.

The rank of a matrix A is the dimension of A's column space.

Numpy Matrix Operations

Numpy has optimized the matrix operations based on the BLAS and LAPACK libraries. So in python, for the most of the time, using Numpy to do matrix operations is faster than writing your own code.

import numpy as np

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Identity matrix
I = np.eye(2)
# Matrix addition
C = A + B
# Matrix multiplication
D = A @ B
# Matrix entry-wise multiplication
E = A * B 
# Matrix transpose
F = A.T
# Matrix inverse
G = np.linalg.inv(A)

Eigenvalues and Eigenvectors

Given two vectors u,vRd, the inner product is u,v:=uTv=j=1dujvj. We define the 2-norm of u as u=|u,u|. The cosine of the angle between u and v is cosα=u,vuv. We say u is orthogonal to v, denoted as uv if u,v=0.

Definition(Eigenvalues and Eigenvectors). We say λ and v are the eigenvalue and eigenvector of a matrix A if Av=λv.

Theorem(Eigenvalue Decomposition). A symmetric matrix ARd×d has:

  • Real eigenvalues: λ1λd
  • Orthonormal eigenvectors: u1,,ud such that ui=1 and uiuj for all 1i<jd

such that we have the eigenvalue decomposition:

A=i=1dλiuiuiT

Denote U=[u1,,ud]Rd×d and

Λ=diag(λ1,,λd):=[λ1000λ2000λd]

The eigenvalue decomposition can also be written as A=UΛUT. We call U an orthogonal matrix as UTU=Id.

As a cornerstone of our chapter, we prefer to interpret the concepts as a solution of an optimization problem. Such interpretation is usually called variational form. The following theorem gives a variational form of eigenvalues.

Theorem(Variational Form of Eigenvalues). Given a symmetric matrix A same as above, its maximum eigenvalue has:

λmax(A):=λ1=maxx1xTAx=maxxxTAxxTx

and u1=argmaxx1xTAx

and its minimum eigenvalue has:

λmin(A):=λd=minx1xTAx=minxxTAxxTx

and ud=argminx1xTAx

Variational form of eigenvalues

Question: What is the variational form of other eigenvalues? See the visualization in the figure above.

The concept of eigenvalue decomposition can be generalized to non-symmetric or even non-square matrices. We have the following theorem on singular value decomposition.

Theorem(Singular Value Decomposition). Given a rank r matrix ARn×d, there exist:

  • Singular values: σ1σr>0 and denote D=diag(σ1,,σr)
  • Orthogonal matrices: U=[u1,,ur]Rn×r and V=[v1,,vr]Rd×r satisfying UTU=VTV=Ir

such that we have the singular value decomposition (SVD):

A=UDVT=i=1rσiuiviT

We can see that the eigenvalue decomposition is a special SVD. The matrix A as a linear map transforms the two canonical unit vectors (yellow and red vectors) in the top left plot to the top right plot. This map can be decomposed into three steps: 1. VT: rotating yellow and red vectors to v1 and v2 2. D: scaling the disc by σ1 horizontally and σ2 vertically 3. U: rotating two canonical unit vectors (blue lines) to u1 and u2

Visualization of SVD

Given the concepts above, we can define the matrix operator norm.

Definition(Matrix Operator Norm). For a matrix ARn×d, its operator norm (also called spectral norm) is defined as:

A2=maxx1Ax=maxx1xTATAx=λmax(ATA)=σ1

where σ1 is the largest singular value of A. The operator norm measures the maximum "stretching factor" of the matrix when applied to any unit vector. Geometrically, it represents the semi-major axis of the ellipse that the unit ball is mapped to under the linear transformation A. In the following, we will simply use A to denote the operator norm of A.

Connection between SVD and Eigenvalue Decomposition:

  • The singular values of A are the square roots of the eigenvalues of ATA: σi(A)=λi(ATA)=λi(AAT).
  • The left singular vectors of A are the same as the eigenvectors of AAT.
  • The right singular vectors of A are the same as the eigenvectors of ATA.
  • If A is symmetric, it will have real eigenvalues and eigenvectors. Assume |λ1||λ2||λd|, then σi(A)=|λi|. Note that we order the eigenvalues by their absolute values.
  • If A is positive semidefinite, it will have non-negative eigenvalues and eigenvectors and the SVD is the same as the eigenvalue decomposition (by ingoring the zero eigenvalues).