Overview¶
This chapter explores the concept of Foundation Models—large-scale models trained on vast amounts of data that can be adapted to a wide range of downstream tasks. We will use the SNP as the example to show you how to build transformer-based foundation models beyond natural language.
- SNP Tokenization & Architecture: How to tokenize DNA (K-mers) and why Encoder-only architectures are preferred.
- SNP Model Training: Pretraining objectives (Masked Language Modeling) and downstream tasks (eQTL prediction).
- SNP Fine-Tuning: Practical tutorial on fine-tuning genomic models using LoRA.
- Advanced Genomic Models: Exploring long-range models (Enformer) and single-cell foundation models (scGPT).
- Model Interpretation: Using Saliency Maps and attention analysis to interpret model predictions biologically.
