Git and GitHub¶
Learning Objectives
- Understand why version control is essential for reproducible research
- Learn basic Git commands and workflows
- Learn how to collaborate using GitHub
- Develop best practices for version control in scientific computing
Why Version Control?¶
Version control is essential for:
- Tracking Changes: Keep a complete history of your project
- Collaboration: Work effectively with others
- Backup: Never lose your work
- Reproducibility: Return to any previous state of your project
- Documentation: Understand why changes were made
Git Basics¶
If you are using VS Code, you can check the tutorial here.
Setting Up Git¶
First time setup:
Initialize a repository only when you create a project
Basic Git Workflow¶
-
Check Status
-
Stage Changes
-
Commit Changes
-
View History
Best Practices for Commits¶
- Commit Often: Make small, logical commits
- Write Clear Messages: Use descriptive commit messages
Different tasks have different best practice for the git. For example, software engineering tasks may prefer to group related changes into one commit and want to test before committing. To avoid half-done work, you can use git commit --amend --no-edit
to quickly add forgotten changes to your last commit, but only if you haven't shared that commit with others. As a git beginner, we suggest that you focus on the two points above and gradually explore the best git practice for you.
GitHub¶
You can read the Github official documentation for the suggested workflow for using GitHub for collaboration.
There are tools like Github Desktop and VS Code Github that can help you manage your repositories more conveniently. You can also create your own workflows using tools like GNU Make with GitHub command-line tool gh
.
GitHub beyond repository
GitHub is not just a place to host your code. It is also a social network for developers. You can follow others, see what they are working on, star their repositories, and collaborate with them.
GitHub is also your resume, it will help you get a job. Imagine the employer goes to your GitHub and sees all the 1k-star projects you have worked on.
Setting Up GitHub¶
- Create a GitHub account at github.com
- Set up SSH keys for secure authentication:
- Add the public key to your GitHub account
- (Optional) Install GitHub gh
GitHub Collaboration Workflow¶
You can follow the VS Code tutorial here for the Github collaboration workflow on VS Code. We suggest the following workflow for contributing to public repositories. You may use Make to automate your workflow instead of memorizing all the commands.
Here are the visualization of the workflow on the commit status of the remote Github, local git, and local disk. You can cross reference to the text instructions below for better understanding. Note that we add a fork step if you want to contribute to a public repository, which you do not have the permission to push to. You can follow the simplified instructions in the figures below if you can access the original repository.
Initial Repository Setup and Branch Creation
-
Fork the Repository
- Go to the original repository on GitHub
- Click the "Fork" button in the top-right corner to create your own copy
- This creates a copy of the repository under your GitHub account
-
Clone Your Forked Repository
- Clone your fork (not the original repository)
-
Add Original Repository as Remote
- Add the original repository as a remote called "upstream"
- You now have two remotes:
origin
: your forkupstream
: original repository
-
Create a New Working Branch
- Every new feature should be developed in a new branch
- Execute
git checkout -b my_feature
to create and switch to a new branch - This effectively creates a local copy on a separate branch
Code Development and Local Changes
-
Modify Local Code
- Make necessary changes to source files
-
Review Code Changes
- Use
git diff
to inspect modifications
- Use
-
Commit Local Changes
- Use
git commit -a -m "Descriptive message about changes"
- Use
-
Push Branch to Your Fork
- Execute
git push -f origin my_feature
- Execute
Handling Remote Repository Updates
-
Switch to Main Branch
- Use
git checkout main
to return to the primary branch - Warning: Commit your branch before switching to main branch, or you will lose all your changes.
- Use
-
Pull Latest Changes
- Execute
git pull upstream main
to update local repository with remote modifications
- Execute
-
Return to Working Branch
- Switch back to my_feature branch using
git checkout my_feature
- Switch back to my_feature branch using
-
Rebase Working Branch
- Use
git rebase main
to integrate main branch updates into your working branch - Note: Potential merge conflicts may require manual code selection
- Understand the difference between git pull and git rebase here
- Use
-
Force Push Updated Branch
- Execute
git push -f origin my_feature
to update remote repository with rebased changes
- Execute
Creating Pull Request
-
Create Pull Request
- Go to the original repository on GitHub, click "Pull Requests"
- Click "New Pull Request"
- Choose "compare across forks"
- Select your fork and feature branch
- Fill in the PR description and submit
-
Merge Request
- Project owner can use "squash and merge" in Github pull request to consolidate commits
- Checkout how "squash and merge" works here
Tip: Always rebase
- Always rebase your branch on the main branch before creating a pull request.
- This will make your pull request history cleaner and easier to review.
- You can simply use
git pull origin my_feature --rebase
to rebase your branch on the github main branch.
Advanced Git Usage¶
Resolving Merge Conflicts¶
We recommend using VS Code to resolve conflicts. Here is the tutorial here. You can also watch this video tutorial.
.gitignore¶
Create a .gitignore
file to exclude:
- Large data files
- Sensitive information
- Generated files
- System files
Example .gitignore
: