R Version Control¶
Learning Objectives
- Learn to use
here
for consistent file paths - Learn to use
renv
for reproducible environments - Develop best practices for R project management
Why Package Management?¶
Package management in R is crucial for:
- Reproducibility: Ensure same package versions across different machines
- Dependency Tracking: Keep track of all required packages
- Project Isolation: Avoid conflicts between different projects
- Collaboration: Make it easy for others to recreate your environment
No Absolute Paths with here
¶
The here
package solves the problem of absolute paths by providing a consistent way to reference files relative to your project root.
Setting Up here
¶
-
Install the package:
-
In each R script or Rmd file, start with:
Using here
¶
Instead of absolute paths or relative paths that might break, use here::here()
:
# Bad practice
data <- read.csv("/Users/me/Projects/my_project/data/my_data.csv")
# Good practice
here::i_am('R/analysis.R')
data <- read.csv(here::here('data', 'my_data.csv'))
Example Script Structure¶
# In R/01_fit_models.R
here::i_am('R/01_fit_models.R')
# Load data
data <- read.csv(here::here('data', 'cleaned_data.csv'))
# Save results
save(results, file = here::here('output', 'model_results.RData'))
Version Control with renv
¶
The renv
package manages project-specific package dependencies, ensuring reproducibility across different environments.
Initial Setup¶
-
Install renv:
-
Initialize in your project:
This creates:
- renv.lock
: Records package versions
- .Rprofile
: Activates renv
- renv/
: Contains project library
Key renv
Commands¶
# Install packages recorded in renv.lock
renv::restore()
# Update renv.lock with current packages
renv::snapshot()
# Remove a package
renv::remove("package_name")
# Check status
renv::status()
Collaborative Workflow with renv
¶
-
Initial Setup (User A):
-
New Collaborator (User B):
-
Adding New Packages (Any User):
-
Syncing Changes (All Users):
Recommended Workflow¶
-
Project Setup:
# Create new RStudio project # Initialize renv and here install.packages(c("renv", "here")) renv::init() library(here) # Document dependencies in DESCRIPTION file usethis::use_description() # Set up version control usethis::use_git() # Create standard directory structure dir.create(here("data", "raw"), recursive = TRUE) dir.create(here("data", "processed"), recursive = TRUE) dir.create(here("R")) dir.create(here("results", "figures"), recursive = TRUE)
-
Development Workflow:
# Start new analysis script library(here) library(tidyverse) # Use here for all file paths data <- read_csv(here("data", "raw", "input.csv")) # Install new package if needed renv::install("newpackage") # Work on your analysis processed_data <- clean_data(data) write_csv(processed_data, here("data", "processed", "clean.csv")) # Save results using here ggsave(here("results", "figures", "analysis_plot.pdf")) # Update lock file renv::snapshot() # Document dependencies in DESCRIPTION usethis::use_package("newpackage")
-
Collaboration Workflow:
-
Deployment/Sharing:
# Ensure all paths use here() # Check no absolute paths remain: grep -r "setwd\\|read\\.csv(" R/ grep -r "^[A-Za-z]:" R/ # Windows paths grep -r "^/" R/ # Unix paths # Ensure all dependencies are documented renv::snapshot() # Clean up unused packages renv::clean() # Bundle project for sharing renv::bundle()
Advanced Tips¶
1. Using Multiple R Versions¶
Use .Rversion
file:
2. Custom Package Sources¶
In renv.lock
:
{
"R": {
"Version": "4.1.2",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cloud.r-project.org"
},
{
"Name": "Custom",
"URL": "https://mycran.org"
}
]
}
}
3. Continuous Integration¶
Example GitHub Actions workflow:
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: r-lib/actions/setup-r@v2
- uses: r-lib/actions/setup-renv@v2
- run: renv::restore()
- run: Rscript -e "source('analysis/analysis.R')"