About this course¶
This course aims to create a beginner-friendly course on statistical computing.
Target audience¶
Whether you're taking your first steps into algorithms and data analysis, or you're looking to strengthen your foundation in statistical computing and establish efficient workflows, this course is designed for you! This course aims to emphasize the width more than the depth. We want to show you the whole landscape of the computing world in one semester to prepare you for zooming into a specific research direction in the future.
Prerequisites
You should know how to write and read simple code in python and R. You should also have basic knowledge of probability, statistics, and linear algebra. You can refer to the Python 101 for the basic Python we expect you to know.
To finish the homework, you may need to know some Markdown and LaTeX syntax but it is not required if you are only interested in learning the course materials.
Content structure¶
The main content of the book is shown in the figure below.
-
Workflow: Covers principles of good coding, especially in efficiency, reproducibility, and AI copilot. We cover Git and GitHub, Makefile, and virtual environments.
-
Data Structures: Focuses on Python data structures like lists, dictionaries, and deque, as well as Numpy arrays and Pandas DataFrames.
-
Algorithms: Computation and memory complexity. We will cover recursion, backtracking, dynamic programming, and greedy algorithms.
-
Numerical Linear Algebra: Introduce the algorithms for linear algebra, including linear equations, eigenvectors, and more.
-
Optimization: Introduces algorithms for solving optimization problems, including convex optimization, stochastic optimization, and non-convex optimization.
-
Deep Learning: introduce the pytorch, deep learning architectures, including neural networks, transformers, and more.
Tips to learn this course¶
- Practice: To understand the code, you should always implement yourself.
- Customize: All the workflows, algorithm strategies, and code styles are only suggestions. You should understand the rationale behind these suggestions and customize them for your own work.
- Seek help: The open-source community is always here to help. You can always seek help from the community. And you should also try to help others.
- Always explore: We cannot cover all the details in the course. As we mentioned above, the course emphasizes the width more than the depth. You should explore deeper to the directions you are interested in. As this course is a computing-oriented course, we will not cover too much math, especially the theory. You should also explore the theory behind the algorithms and methods you are interested in.
- Adaptively skip: The course materials are designed to be a beginner-friendly course. You can skip some materials that you are already familiar with (or not interested in).
- Abstract blackboxes: When we talk about skipping, one of most important philosophy in coding is "modular programming". It is more important to know how to decompose your problems into smaller, self-contained modules that encapsulate their implementation details realized yourself or by existing Application Programming Interface (API). You do not always need to open the blackbox.
Tip: What is the real coding skill?
The computing is such a broad field. You will never get prepared for everything. The most important coding skill is to decompose your problems into smaller problems, to be able to identify the potential toolboxes to solve your problems, and to be able to digest them efficiently. This is also the scope of this course to offer you the "menu" more than the "recipe" and the meta-learning more than the specific learning.
Acknowledgements¶
This course uses the materials from many fantastic resources. We thank the authors for these materials. Here are the list of resources we have used:
- MIT Missing Semester
- Hello Algorithms
- INFO550 Data Science Toolkit
- Harvard Chan Bioinformatics Core Training
- Berkeley Statistics Tutorials
We also reference many textbooks which are listed in the Reference.
If any authors have concerns regarding the use of their materials or copyright issues, please contact us at GitHub Issues. We are committed to respecting intellectual property rights and will promptly address any concerns.