Web Analytics
Skip to content

Overview

This chapter covers fine-tuning large language models (LLMs) built on the Transformer architecture. We start from the mathematical foundations of how LLMs are trained and progress through parameter-efficient adaptation and reinforcement-learning-based fine-tuning.

Cover

Lectures

  • Next-Token Prediction
    The core pre-training objective: autoregressive factorization, cross-entropy loss, perplexity, and PyTorch implementation.

  • Supervised Fine-Tuning (SFT)
    The SFT loss function with prompt masking, MLE interpretation, and end-to-end training.

  • RL Fine-Tuning with GRPO
    The GRPO objective, group-relative advantage, clipped surrogate loss, KL penalty, and reward function design.

  • Direct Preference Optimization (DPO)
    Aligning models with human preferences directly using preference pairs (chosen vs. rejected) without a reward model.

  • PEFT: LoRA and QLoRA
    Low-rank adaptation math, scaling, parameter savings, and QLoRA with quantized base models.