Skip to content

Eigenvalues and Eigenvectors Algorithms

We now discuss the numerical algorithms for computing eigenvalues and eigenvectors. We will mainly focus on the symmetric matrices with real eigenvalues and eigenvectors. For the general case, most of the tasks in data science need to conduct the singular value decomposition (SVD).

Recall the symmetric matrix A has the eigenvalue decomposition A=UΛUT, where U=(u1,u2,,un) is an orthogonal matrix with the columns u1,u2,,un being the eigenvectors of A and Λ=diag(λ1,λ2,,λn) is a diagonal matrix with diagonal entries λ1λ2λn being the eigenvalues of A.

Sensitivity of Eigenvalues and Eigenvectors

There are two famous theorems about the sensitivity of eigenvalues and eigenvectors.

Theorem(Weyl's inequality): If A,BRn×n are two symmetric matrices, then for any eigenvalue λj of A and μj of B, we have:

|λjμj|AB2.

Theorem(Davis-Kahan Theorem): If A,BRn×n are two symmetric matrices, with eigenvalues λ1λ2λn and μ1μ2μn, let uj,vj be the j-th eigenvector of A and B corresponding to λj and μj, we have:

ujvj21.5ABmin(λj1λj,λjλj+1).

From the above two theorems, we can see that the eigenvalues are stable but the eigenvectors will be sensitive when the eigengap Δj=min(λj1λj,λjλj+1) is small.

Be aware that the above theorems are the simplified version. We can refer to the Matrix Perturbation Theory for more details.

Power Method

Here we consider the positive definite matrix A with the eigenvalues λ1>λ2λn>0.

The power method applies the power of the matrix A:

Ak=UΛkUT=j=1nλjkujujT.

Let us take a vector x0Rn which can be expressed as:

x0=α1u1+α2u2++αnun,

where αiR. Thus, we have:

Akx0=j=1nαjAkuj=λ1k[α1u1+j=2nαj(λjλ1)kuj].

Therefore, it follows that:

limkAkx0λ1k=α1u1.

This indicates that, when α10 and k is sufficiently large, the vector:

xk=Akx0λ1k

is a very good approximation of an eigenvector of A.

This motivates the power method to find the largest eigenvalue and its corresponding eigenvector.

xk+1=Axkxk+1=xk+1||xk+1||

Notice that we normalize the vector xk+1 in each iteration as the eigenvector is only determined by the direction.

import numpy as np
A = np.array([[4, 1],
              [2, 3]])
tol = 1e-6
max_iter = 1000
# Initialize a random vector
n = A.shape[0]
x = np.random.rand(n)

for _ in range(max_iter):
    # Multiply A with the vector
    x = A @ x
    x_norm = np.linalg.norm(x)    
    # Normalize the vector
    x = x / x_norm
    # Check for convergence
    if np.linalg.norm(x_new - x) < tol:
        break

Convergence and Sensitivity: We can see from the above analysis that the power method convergence speed is determined by λ2/λ1. If λ2 is close to λ1, the convergence is slow. This also tells us that the convergence speed of the power method is determined by eigengap.

Other eigenvectors: The power method above only finds the largest eigenvalue and its corresponding eigenvector. As A1 has the largest eigenvalues λn1 with the eigenvector un, we can use the power method to A1 to get the smallest eigenvalue and its corresponding eigenvector. This is called the inverse power method.

xk+1=A1xkxk+1=xk+1||xk+1||

We can see that the inverse power method needs to solve a linear system. You can improve the efficiency by apply the LU decomposition first. Similar to the above analysis, the convergence speed of the inverse power method is determined by λn/λn1.

We can further apply the inverse power method to AμI with some shift μ to accelerate the convergence. This is called the shifted inverse power method. Without loss of generality, if we want to compute λ1, and we assume 0<|λ1μ|<|λ2μ||λnμ|. The convergence speed is determined by |λkμ|/|λk+1μ|, the shift μ should be chosen as close to λk as possible to accelerate the convergence. You may worry that when μ is too close to λk, linear equation for AμI is ill-conditioned. Actually, it can be shown that the illness of AμI will not affect the convergence of the inverse power method.

QR Method

Though the power method is simple and easy to implement, it depends on the eigenvalue distribution. Besides, we can only find one eigenvalue and its corresponding eigenvector using the power method. We now discuss how to get the full eigen-decomposition using the QR method.

QR Decomposition: For any matrix A, there exists a QR decomposition A=QR, where Q is an orthogonal matrix and R is an upper triangular matrix. QR decomposition can be computed by the Householder transformation with the complexity O(n3).

QR decomposition can be applied to any matrix A, but the Q,R could be complex if A is not symmetric. So we mainly focus on the symmetric matrices in the following discussion. Also, for rectangular matrix ARm×n with m>n, we can also apply the QR decomposition as

A=Q(R0)

where R is an upper triangular matrix.

QR Iteration: We can use the QR iteration to find the eigenvalues and eigenvectors: Given A0=A,

Ak=QkRkAk+1=RkQk

Theorem(Convergence of QR Iteration): If A has distinct eigenvalues |λ1|>|λ2|>>|λn|>0, then Ak will converge to an upper triangular matrix with the eigenvalues on the diagonal. Q~k=Q1Q2Qk will converge to the eigenvectors.

QR Iteration

In fact, we can see from the QR iteration that

Ak=Q1Q2QkRkRk1R1.

Therefore, the QR algorithm can be seen as a more sophisticated variation of the basic "power" eigenvalue algorithm.

Here's a simple implementation of the QR iteration method:

import numpy as np

# Create a symmetric test matrix
n = 4
A = np.random.rand(n, n)
A = (A + A.T) / 2  

# Initialize
A_k = A.copy()
V = np.eye(n)  # To accumulate eigenvectors
tol = 1e-8
max_iter = 1000

# QR iteration
for _ in range(max_iter):
    # QR decomposition
    Q, R = np.linalg.qr(A_k)

    # Update A_k
    A_k = R @ Q

    # Accumulate eigenvectors
    V = V @ Q

    # Check convergence of off-diagonal elements
    if np.sum(np.abs(A_k - np.diag(np.diag(A_k)))) < tol:
        break

# Eigenvalues are on the diagonal of final A_k
eigenvals = np.diag(A_k)

Practical Algorithms

We will briefly discuss the practical algorithms for computing eigenvalues and eigenvectors below. Please refer to the specialized textbooks, e.g., Matrix Computations, for more details.

The QR iteration method above is just the basic version. It can be further accelerated by applying the shift technique like the power method. In fact, the state-of-the-art Francis algorithm applies the famous implicit double-shift with no QR decomposition being explicitly performed. In fact, instead of QR decomposition, the Francis algorithm decomposes matrix into QH form, where Q is orthogonal but H is an upper Hessenberg matrix. The complexity of the Francis algorithm is O(n3).

When A is symmetric, we ususally apply the tridiagonalization to A=QTQT first, where T is a tridiagonal matrix. Then we can apply an implicit QR iteration to T with the Wilkinson shift. This is one of the most remarkable algorithms in the numerical linear algebra. We refer to the notes for more details.

Upper TriangularUpper HessenbergTridiagonal[×××××0××××00×××000××0000×][××××××××××0××××00×××000××][××000×××000×××000×××000××]

For SVD decomposition A=UΣVTRm×n, mn, as U is the eigenvector matrix of ATA and V is the eigenvector matrix of ATA, we can apply the QR iteration to ATA to get the eigenvalues and eigenvectors of ATA. However, the practical SVD algorithm will not conduct eigen-decomposition of ATA and ATA explicitly. Refer to the Golub-Reinsch algorithm for more details. The complexity of the Golub-Reinsch algorithm is usually O(m2n+n2m+n3).

Python Implementation

Like the linear equation, both numpy and scipy provide the linalg module for the eigenvalue and eigenvector problems.

You may need to consider to use different functions based on:

  • Whether A is symmetric
  • Whether you only need the eigen/singular values
  • Whether you only need the top k or the complete decomposition
import numpy as np

A = np.array([[1, 2],
              [2, 3]])

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
# If you only need eigenvalues
eigenvalues = np.linalg.eigvals(A)
# Eigendecomposition for symmetric matrices
eigenvalues, eigenvectors = np.linalg.eigh(A)
# If you only need eigenvalues
eigenvalues = np.linalg.eigvalsh(A)
# Perform SVD
U, S, VT = np.linalg.svd(A, full_matrices=False) #If full_matrices = True (default), U and VT are of shape (M, M), (N, N). If False, the shapes are (M, K) and (K, N), where K = min(M, N).
# If you only need singular values
singular_values = np.linalg.svdvals(A)

# Compute the largest k eigenvalues and eigenvectors, when k << n
from scipy.sparse.linalg import eigs, eigsh, svds
k = 1
eigenvalues, eigenvectors = eigs(A, k=k, which='LM')  # 'LM' selects largest magnitude eigenvalues
# For symmetric matrices
eigenvalues, eigenvectors = eigsh(A, k=k, which='LA')  # 'LA' selects largest algebraic eigenvalues
# Top k singular values and vectors
U, S, VT = svds(A, k=k)

The scipy.sparse.linalg package provides more efficient implementations for large sparse matrices and it implements the specialized algorithms if you only need top k eigenvalues and eigenvectors. Refer to the scipy documentation for more details. You only need to consider to use the scipy.sparse.linalg for the top k eigenvalues and eigenvectors when kn and n is super large. We encourage you to test the computational efficiency for the above code to see the difference using different methods.