z - Advanced topic: Reproducible Analytical Pipelines with Nix

Introduction

Isolated environments are great to run pipelines in a safe and reproducible manner. This vignette details how to build a reproducible analytical pipeline using an environment built with Nix that contains the right version of R and packages.

An example of a reproducible analytical pipeline using Nix

Suppose that you’ve used {targets} to build a pipeline for a project and that you did so using a tailor-made Nix environment. Here is the call to rix() that you could have used to build that environment:

path_default_nix <- tempdir()

rix(
  r_ver = "4.2.2",
  r_pkgs = c("targets", "tarchetypes", "rmarkdown"),
  system_pkgs = NULL,
  git_pkgs = list(
    package_name = "housing",
    repo_url = "https://github.com/rap4all/housing/",
    commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"
  ),
  ide = "other",
  project_path = path_default_nix,
  overwrite = TRUE
)

This call to rix() generates the following default.nix file:

#> # This file was generated by the {rix} R package v0.13.4 on 2024-11-20
#> # with following call:
#> # >rix(r_ver = "8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8",
#> #  > r_pkgs = c("targets",
#> #  > "tarchetypes",
#> #  > "rmarkdown"),
#> #  > system_pkgs = NULL,
#> #  > git_pkgs = list(package_name = "housing",
#> #  > repo_url = "https://github.com/rap4all/housing/",
#> #  > commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"),
#> #  > ide = "other",
#> #  > project_path = path_default_nix,
#> #  > overwrite = TRUE)
#> # It uses nixpkgs' revision 8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8 for reproducibility purposes
#> # which will install R version 4.2.2.
#> # Report any issues to https://github.com/ropensci/rix
#> let
#>  pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/8ad5e8132c5dcf977e308e7bf5517cc6cc0bf7d8.tar.gz") {};
#>  
#>   rpkgs = builtins.attrValues {
#>     inherit (pkgs.rPackages) 
#>       rmarkdown
#>       tarchetypes
#>       targets;
#>   };
#>  
#>   git_archive_pkgs = [
#>     (pkgs.rPackages.buildRPackage {
#>       name = "housing";
#>       src = pkgs.fetchgit {
#>         url = "https://github.com/rap4all/housing/";
#>         rev = "1c860959310b80e67c41f7bbdc3e84cef00df18e";
#>         sha256 = "sha256-s4KGtfKQ7hL0sfDhGb4BpBpspfefBN6hf+XlslqyEn4=";
#>       };
#>       propagatedBuildInputs = builtins.attrValues {
#>         inherit (pkgs.rPackages) 
#>           dplyr
#>           ggplot2
#>           janitor
#>           purrr
#>           readxl
#>           rlang
#>           rvest
#>           stringr
#>           tidyr;
#>       };
#>     })
#>    ];
#>    
#>   system_packages = builtins.attrValues {
#>     inherit (pkgs) 
#>       R
#>       glibcLocales
#>       nix;
#>   };
#>   
#> in
#> 
#> pkgs.mkShell {
#>   LOCALE_ARCHIVE = if pkgs.system == "x86_64-linux" then "${pkgs.glibcLocales}/lib/locale/locale-archive" else "";
#>   LANG = "en_US.UTF-8";
#>    LC_ALL = "en_US.UTF-8";
#>    LC_TIME = "en_US.UTF-8";
#>    LC_MONETARY = "en_US.UTF-8";
#>    LC_PAPER = "en_US.UTF-8";
#>    LC_MEASUREMENT = "en_US.UTF-8";
#> 
#>   buildInputs = [ git_archive_pkgs rpkgs  system_packages   ];
#>   
#> }

The environment that gets built from this default.nix file contains R version 4.2.2, the {targets} and {tarchetypes} packages, as well as the {housing} packages, which is a package that is hosted on GitHub only with some data and useful functions for the project. Because it is on Github, it gets installed using the buildRPackage function from Nix. You can use this environment to work on you project, or to launch a {targets} pipeline. This Github repository contains the finalized project.

On your local machine, you could execute the pipeline in the environment by running this in a terminal:

cd /absolute/path/to/housing/ && nix-shell default.nix --run "Rscript -e 'targets::tar_make()'"

If you wish to run the pipeline whenever you drop into the Nix shell, you could add a Shell-hook to the generated default.nix file:

path_default_nix <- tempdir()

rix(
  r_ver = "4.2.2",
  r_pkgs = c("targets", "tarchetypes", "rmarkdown"),
  system_pkgs = NULL,
  git_pkgs = list(
    package_name = "housing",
    repo_url = "https://github.com/rap4all/housing/",
    commit = "1c860959310b80e67c41f7bbdc3e84cef00df18e"
  ),
  ide = "other",
  shell_hook = "Rscript -e 'targets::tar_make()'",
  project_path = path_default_nix,
  overwrite = TRUE
)

Now, each time you drop into the Nix shell for that project using nix-shell, the pipeline gets automatically executed. {rix} also features a function called tar_nix_ga() that adds a GitHub Actions workflow file to make the pipeline run automatically on GitHub Actions. The GitHub repository linked above has such a file, so each time changes get pushed, the pipeline runs on Github Actions and the results are automatically pushed to a branch called targets-runs. See the workflow file here. This feature is very heavily inspired and adapted from the targets::github_actions() function.