Overview¶
This chapter covers fine-tuning large language models (LLMs) built on the Transformer architecture. We start from the mathematical foundations of how LLMs are trained and progress through parameter-efficient adaptation and reinforcement-learning-based fine-tuning.
Lectures¶
-
Next-Token Prediction
The core pre-training objective: autoregressive factorization, cross-entropy loss, perplexity, and PyTorch implementation. -
Supervised Fine-Tuning (SFT)
The SFT loss function with prompt masking, MLE interpretation, and end-to-end training. -
RL Fine-Tuning with GRPO
The GRPO objective, group-relative advantage, clipped surrogate loss, KL penalty, and reward function design. -
Direct Preference Optimization (DPO)
Aligning models with human preferences directly using preference pairs (chosen vs. rejected) without a reward model. -
PEFT: LoRA and QLoRA
Low-rank adaptation math, scaling, parameter savings, and QLoRA with quantized base models.
