Overview¶

Open Interactive Notebook in Colab

This chapter explores the concept of Foundation Models—large-scale models trained on vast amounts of data that can be adapted to a wide range of downstream tasks. We will use the SNP as the example to show you how to build transformer-based foundation models beyond natural language.

SNP Tokenization & Architecture: How to tokenize DNA (K-mers) and why Encoder-only architectures are preferred.
SNP Model Training: Pretraining objectives (Masked Language Modeling) and downstream tasks (eQTL prediction).
SNP Fine-Tuning: Practical tutorial on fine-tuning genomic models using LoRA.
Advanced Genomic Models: Exploring long-range models (Enformer) and single-cell foundation models (scGPT).
Model Interpretation: Using Saliency Maps and attention analysis to interpret model predictions biologically.