Index

Overview¶

This chapter covers fine-tuning large language models (LLMs) built on the Transformer architecture. We start from the mathematical foundations of how LLMs are trained and progress through parameter-efficient adaptation and reinforcement-learning-based fine-tuning.

Lectures¶

Next-Token Prediction
The core pre-training objective: autoregressive factorization, cross-entropy loss, perplexity, and PyTorch implementation.
Supervised Fine-Tuning (SFT)
The SFT loss function with prompt masking, MLE interpretation, and end-to-end training.
RL Fine-Tuning with GRPO
The GRPO objective, group-relative advantage, clipped surrogate loss, KL penalty, and reward function design.
Direct Preference Optimization (DPO)
Aligning models with human preferences directly using preference pairs (chosen vs. rejected) without a reward model.
PEFT: LoRA and QLoRA
Low-rank adaptation math, scaling, parameter savings, and QLoRA with quantized base models.