Junwei Lu, Harvard University

BST 236: Computing I

Course Website [Link]

Syllabus

This course is a beginner-friendly course on statistical computing in the era of generative AI. We are going to cover the topics in:

Workflow

Principles of good coding practices, code efficiency and reproducibility, Git and GitHub version control, Makefile automation, virtual environments

Code with AI

AI copilot integration, prompt engineering, AI-assisted coding

Data Structures

Python built-in data structures (lists, dictionaries, deque), NumPy arrays and operations, Pandas DataFrames and data manipulation

Algorithms

Computational complexity analysis, memory complexity, recursion techniques, backtracking algorithms, dynamic programming, greedy algorithms

Numerical Linear Algebra

Algorithms for solving linear equations, eigenvalue decomposition, singular value decomposition, matrix factorizations

Optimization

Convex optimization methods, stochastic optimization algorithms, non-convex optimization techniques, gradient-based methods

AI Methods

PyTorch framework, neural network architectures, transformer models, deep learning training and optimization

BST 235: Advanced Regression and Statistical Learning

Lecture Notes [PDF]

Syllabus

This course is an advanced course on the methods and theory of high dimensional statistics, statistical machine learning, and large-scale inference and optimization. We aim to quickly bring students to the frontier and interdisciplinary areas of statistics, optimization, probability, and machine learning. We are going to cover the topics in:

High dimensional probability

Concentration inequality, Sub-Gaussian random variables, Chernoff bounds, Hoeffding's inequality, Maximal inequalities

High dimensional linear regression

Ordinary least square, Compressed sensing, Lasso, Variations of Lasso including group lasso, fused lasso, adaptive lasso, etc., General high dimensional M- estimators, Variable selection consistency

High dimensional Optimization

Convex geometry, Lagrange duality, Gradient descent, Proximal gradient descent, LARS, ADMM, Mirror descent, Stochastic optimization

Large-Scale Inference

Linear model hypothesis testing, high dimensional inference, Chi-square test, maximal test, False discovery rate control, Knockoff filter

BST 263: Statistical Learning

Lecture Notes [PDF]

Syllabus

This course introduces statistical machine learning under the big umbrella of data, sincere and modern analytics. More detailed topics include:

Regression Methods

OLS, Lasso, Ridge, Elastic-Net, Tree based methods, Random Forests

Classification Methods

Logistic regression, Support vector machine, Linear discriminant analysis, Naive Bayes, Boosting

Clustering Methods

Mixture of Gaussians, Expectation maximization algorithm, K-means

Advanced Topics

Deep learning, Reinforcement learning