Introduction to Targets

Or How I Learnt to Stop Worrying and Love the DAG

Michael Jones

2025-02-26

The one-minute pitch

Targets is a framework for declaratively specifying an analysis pipeline, controlling dependencies, and only running what needs to be re-run.

It’s a build system for analytics

Problem Example

path/to/project$ ls
  my_data.xlsx
  work.R

Problem Example (Aside)

path/to/project$ ls -a
  my_data.xlsx
  work.R
  .RData

Problem Example

path/to/project$ ls
  my_data.xlsx
  auxiliary_data.csv
  output.qmd
  output.rmd
  output.html
  data_clean.R
  analysis.R
  plots.R
  data_process.R

Problem Example

path/to/project$ ls
  data/
    my_data.xlsx
    auxiliary_data.csv
  reports/
    output.qmd
    output.html
  01_data_clean.R
  02_data_process.R
  03_analysis.R
  04_plots.R

The Spectrum of Reproducibility

R code in the console

A saved script

A structured script/collection of scripts

A Targets Program

Docker/VMs

Reproducible Research - Problems

“I don’t know what these files are”

“How does this project fit together?”

“How to run this project?”

“Is everything up to date?”

“Well, it works on my machine!”

Reproducible Research - Targets

“I don’t know what these files are”

“How does this project fit together?”

“How to run this project?”

“Is everything up to date?”

“Well, it works on my machine!”

Targets helps

You in 6 months
Others you are working with
You in five minutes: Save re-running steps unnecessarily

DAGS

Directed - There is flow between points along the lines

Acyclic - No loops, a single direction of flow

Graph - A collection of points connected with lines

Example DAG

Targets Basics

Getting Started - Project Format

path/to/project$ ls
  _targets.R
  functions.R

_targets.R : where you define your “plan”

functions.R: where you store helper functions.

Targets will make a _targets/, you don’t need to go in there.

Targets - Concepts

Targets are steps along your analysis: each has a name and a function to produce the result. Can depend on other targets
Plan is the collection of targets that make up your analysis
Cache behind the scenes targets stores the value of each target for you

Simple `_targets.R`

library(targets)

source("functions.R")

list(
  tar_target(data_path, "path/to/data.csv"),
  tar_target(data, read_data(data_path)),
  tar_target(graph, plot_graph(data)),
  tar_target(analysis, do_analysis(data)),
  tar_target(report, make_report(graph, analysis))
)

Simple `_targets.R`

tar_visnetwork()

Run the Pipeline

tar_make()

▶ dispatched target data_path
● completed target data_path [0 seconds, 68 bytes]
▶ dispatched target data
● completed target data [0 seconds, 1.091 kilobytes]
▶ dispatched target analysis
● completed target analysis [0 seconds, 44 bytes]
▶ dispatched target graph
● completed target graph [0.114 seconds, 88.617 kilobytes]
▶ dispatched target report
● completed target report [0 seconds, 44 bytes]
▶ ended pipeline [0.204 seconds]

Look at the graph:

Recall a Target

# ephemeral
tar_read(graph)

# Assign to value
my_graph <- tar_read(graph)

# Load it in as the name of the target
graph <- tar_read(graph)
tar_load(graph) # equivalent

Pause

Steps in our process are defined with tar_target()

Targets are defined as the output to a function

We can see the status using tar_visnetwork()

We can run the whole pipeline with tar_make()

We can get run targets with tar_read() or tar_load()

Errors

Handling Errors

tar_make()

▶ dispatched target analysis
▶ recorded workspace analysis
✖ errored target analysis
✖ errored pipeline [0.068 seconds]

Error:
! targets::tar_make() error
    • tar_errored()
    • tar_meta(fields = any_of("error"), complete_only = TRUE)
    • tar_workspace()
    • tar_workspaces()
    • Debug: https://books.ropensci.org/targets/debugging.html
    • Help: https://books.ropensci.org/targets/help.html
    analysis failed
    do_analysis(process, fail = TRUE)
    stop("analysis failed")
    .handleSimpleError(function (condition)  {     state$error <- build_mess...
    h(simpleError(msg, call))

Handling Errors

See which target failed from the graph
Read the error message
Try re-running out of the plan
Fix and move on

Dependency Management

library(targets)

source("functions.R")

list(
  tar_target(data_path_1, "data_1.csv", format = "file"),
  tar_target(data_path_2, "data_2.csv", format = "file"),
  tar_target(data_1, read_data(data_path_1)),
  tar_target(data_2, read_data(data_path_2)),
  tar_target(process_1, run_process_1(data_1)),
  tar_target(process_2, run_process_2(data_2)),
  tar_target(report, make_report(process_1, process_2))
)

Look at the graph

tar_visnetwork(targets_only = TRUE)

Run the pipeline

tar_make()

▶ dispatched target data_path_1
● completed target data_path_1 [0 seconds, 194 bytes]
▶ dispatched target data_path_2
● completed target data_path_2 [0 seconds, 0 bytes]
▶ dispatched target data_1
● completed target data_1 [0 seconds, 44 bytes]
▶ dispatched target data_2
● completed target data_2 [0.001 seconds, 44 bytes]
▶ dispatched target process_1
● completed target process_1 [0.001 seconds, 44 bytes]
▶ dispatched target process_2
● completed target process_2 [0.001 seconds, 44 bytes]
▶ dispatched target report
● completed target report [0 seconds, 44 bytes]
▶ ended pipeline [0.072 seconds]

Run the pipeline

`data_1` changes

Re-Run the pipeline

tar_make()

▶ dispatched target data_path_1
● completed target data_path_1 [0.001 seconds, 194 bytes]
✔ skipped target data_path_2
▶ dispatched target data_1
● completed target data_1 [0 seconds, 44 bytes]
✔ skipped target data_2
✔ skipped target process_1
✔ skipped target process_2
✔ skipped target report
▶ ended pipeline [0.07 seconds]

Targets to generate Reports

Report Targets

Use library(tarchetypes) to get tar_quarto()
Include a tar_quarto() in the place of a tar_target()
tar_read() or tar_load() in your QMD file
{targets} will be aware of the dependencies

Report Targets

Best to pre-generate, e.g. consider making graphs as targets and tar_read() them into your QMD rather than including the ggplot code in your QMD
End-to-End
Can produce any output format Quarto can, e.g PPT, HTML, Docx etc

Many Targets

Many Targets - Hand coded

library(targets)

source("functions.R")

list(
  tar_target(data, "data_1.csv", format = "file"),
  tar_target(analysis_1, do_analysis(data, value = 1)),
  tar_target(analysis_2, do_analysis(data, value = 2)),
  tar_target(analysis_3, do_analysis(data, value = 3)),
  tar_target(analysis_4, do_analysis(data, value = 4)),
  tar_target(analysis_5, do_analysis(data, value = 5)),
  tar_target(analysis_6, do_analysis(data, value = 6)),
  tar_target(report, make_report(analysis_1, analysis_2, analysis_3, analysis_4, analysis_5, analysis_6))
)

Many Targets - Hand coded

Many Targets - Let Targets do the work

library(targets)

source("functions.R")

list(
  tar_target(data, "data_1.csv", format = "file"),
  tar_target(values, 1:6),
  tar_target(analysis, do_analysis(data, values), pattern = map(values)),
  tar_target(report, make_report(analysis))
)

Pause

This is “Dynamic Branching”, there’s also “Static Branching”
can do map, cross, slice, sample and more
Also a variety of ways to combine the branches back into a single target, e.g. for summary

Many Targets - Let Targets do the work

Parallel Computing

Nodes at the same depth level of a DAG are by definition independent
This means parallelisation is embarrassingly easy

Parallel Computing - Without

library(targets)

source("functions.R")

list(
  tar_target(data, "data_1.csv", format = "file"),
  tar_target(values, 1:6),
  tar_target(analysis, do_analysis(data, values),
             pattern = map(values)),
  tar_target(report, make_report(analysis))
)

Parallel Computing - With

library(targets)
library(crew)

source("functions.R")

tar_option_set(
  controller = crew_controller_local(workers = 2)
)

list(
  tar_target(data, "data_1.csv", format = "file"),
  tar_target(values, 1:6),
  tar_target(analysis, do_analysis(data, values), 
             pattern = map(values)),
  tar_target(report, make_report(analysis))
)

tar_make()

More information

The user manual: https://books.ropensci.org/targets/

Summary

Targets handles dependencies, and gives you tools for project structure.
Relatively little overhead to provide a sound scaffold around your project
Encourages separating code into functions, which is good

Introduction to Targets

The one-minute pitch

Problem Example

Problem Example (Aside)

Problem Example

Problem Example

The Spectrum of Reproducibility

Reproducible Research - Problems

Reproducible Research - Targets

Targets helps

DAGS

DAGS

Directed - There is flow between points along the lines

Acyclic - No loops, a single direction of flow

Graph - A collection of points connected with lines

Example DAG

Targets Basics

Getting Started - Project Format

Targets - Concepts

Simple _targets.R

Simple _targets.R

Run the Pipeline

Look at the graph:

Recall a Target

Pause

Errors

Handling Errors

Handling Errors

Handling Errors

Dependency Management

Dependency Management

Look at the graph

Run the pipeline

Run the pipeline

data_1 changes

Re-Run the pipeline

Targets to generate Reports

Report Targets

Report Targets

Many Targets

Many Targets - Hand coded

Many Targets - Hand coded

Many Targets - Let Targets do the work

Pause

Many Targets - Let Targets do the work

Parallel Computing

Parallel Computing - Without

Parallel Computing - With

More information

Summary

Simple `_targets.R`

Simple `_targets.R`

`data_1` changes