Duality and ADMM¶
Composite Objective Function¶
We have introduced proximal gradient descent to solve optimization problems of the form
For example, when
where
Unlike the
We discuss solving composite objective functions. Generally, we are interested in the following composite optimization problem:
for some convex
Another example is basis pursuit:
Duality¶
To solve the composite optimization problem, we start by reviewing duality in optimization. We define the Lagrange multiplier function of the problem as:
This allows us to convert the primal problem to the dual problem:
Duality is crucial in optimization. The following theorem implies we can solve the primal problem via the dual problem.
Theorem (Strong Duality): Given convex functions
Consider the dual problem
As
The strong duality property is useful for converting a linear constraint problem to an unconstrained dual problem. The following example shows how to find the dual problem of Lasso.
Example: Duality of Lasso¶
As Lasso is an unconstrained problem, we first convert it to the standard form:
The Lagrange function is:
Notice that
The second problem is quadratic, and we have:
Applying the first-order optimality condition to the first problem, we get:
As
Therefore, we have:
Combining the results, the dual problem becomes:
Therefore, the dual problem of Lasso is:
This is equivalent to:
By the duality, the dual variable
Alternating Direction Method of Multipliers (ADMM)¶
Consider an equivalent form of the problem by adding a quadratic term:
We introduce the augmented Lagrangian:
To solve the dual problem
Primal step:
Dual step:
We add a proximal term in the dual step because the augmented Lagrangian
Plugging the augmented Lagrangian into the primal and dual steps, we have the following algorithm.
Alternating Direction Method of Multipliers (ADMM)
If
ADMM not first-order method
ADMM is not a first-order method. The sub-problem in the primal step use the functions
Example: Fused Lasso¶
Applying ADMM to the fused Lasso:
The updating rule is:
This yields the algorithm:
When
Here is the implementation of ADMM for fused Lasso in PyTorch. Notice we can implement the tips in Linear Algebra to pre-compute the Cholesky decomposition of the matrix
import torch
# Soft-thresholding function
def soft_thresholding(y, threshold):
return torch.sign(y) * torch.clamp(torch.abs(y) - threshold, min=0)
# Define problem parameters
torch.manual_seed(42)
n, d = 10, 5 # Number of samples (n) and features (d)
X = torch.randn(n, d) # Design matrix
Y = torch.randn(n) # Response vector
# Define difference matrix D (for fused lasso)
D = torch.eye(d) - torch.eye(d, k=1) # First-order difference matrix
# Initialize variables
beta = torch.zeros(d) # Coefficients
z = torch.zeros(d - 1) # Auxiliary variable
lam = torch.zeros(d - 1) # Lagrange multiplier
# ADMM parameters
lambda_ = 0.1 # Regularization parameter
rho = 1.0 # ADMM penalty parameter
num_iters = 100 # Number of iterations
# Compute Cholesky decomposition of (X^T X + rho D^T D)
XtX = X.T @ X
DtD = D.T @ D
A = XtX + rho * DtD # Regularized system matrix
# Cholesky decomposition (more stable than direct inversion)
L = torch.linalg.cholesky(A) # Compute Cholesky factor L
for t in range(num_iters):
# Solve for beta using Cholesky decomposition
b_rhs = X.T @ Y + rho * D.T @ z - D.T @ lam # Right-hand side
y = torch.linalg.solve_triangular(L, b_rhs, upper=False) # Forward substitution
beta = torch.linalg.solve_triangular(L.T, y, upper=True) # Backward substitution
# Update z using soft-thresholding
z = soft_thresholding(D @ beta + lam / rho, lambda_ / rho)
# Update lambda (dual variable)
lam += rho * (D @ beta - z)
When
Example: Graphical Lasso¶
Given i.i.d. samples
Applying ADMM, we get:
where the Frobenius norm
where for the spectral decomposition
Example: Consensus Optimization¶
We design a divide-and-conquer ADMM for Lasso for massive data when
can be formulated as the general block separable form:
Applying ADMM, we get:
ADMM for Consensus Optimization
In the first step, we update each