Or How I Learnt to Stop Worrying and Love the DAG
2025-02-26
Targets is a framework for declaratively specifying an analysis pipeline, controlling dependencies, and only running what needs to be re-run.
It’s a build system for analytics
path/to/project$ ls
my_data.xlsx
work.R
path/to/project$ ls -a
my_data.xlsx
work.R
.RData
path/to/project$ ls
my_data.xlsx
auxiliary_data.csv
output.qmd
output.rmd
output.html
data_clean.R
analysis.R
plots.R
data_process.R
path/to/project$ ls
data/
my_data.xlsx
auxiliary_data.csv
reports/
output.qmd
output.html
01_data_clean.R
02_data_process.R
03_analysis.R
04_plots.R
R code in the console
A saved script
A structured script/collection of scripts
A Targets Program
Docker/VMs
“I don’t know what these files are”
“How does this project fit together?”
“How to run this project?”
“Is everything up to date?”
“Well, it works on my machine!”
“I don’t know what these files are”
“How does this project fit together?”
“How to run this project?”
“Is everything up to date?”
“Well, it works on my machine!”
path/to/project$ ls
_targets.R
functions.R
_targets.R
: where you define your “plan”
functions.R
: where you store helper functions.
Targets will make a _targets/
, you don’t need to go in there.
_targets.R
_targets.R
▶ dispatched target data_path
● completed target data_path [0 seconds, 68 bytes]
▶ dispatched target data
● completed target data [0 seconds, 1.091 kilobytes]
▶ dispatched target analysis
● completed target analysis [0 seconds, 44 bytes]
▶ dispatched target graph
● completed target graph [0.114 seconds, 88.617 kilobytes]
▶ dispatched target report
● completed target report [0 seconds, 44 bytes]
▶ ended pipeline [0.204 seconds]
Steps in our process are defined with tar_target()
Targets are defined as the output to a function
We can see the status using tar_visnetwork()
We can run the whole pipeline with tar_make()
We can get run targets with tar_read()
or tar_load()
▶ dispatched target analysis
▶ recorded workspace analysis
✖ errored target analysis
✖ errored pipeline [0.068 seconds]
Error:
! targets::tar_make() error
• tar_errored()
• tar_meta(fields = any_of("error"), complete_only = TRUE)
• tar_workspace()
• tar_workspaces()
• Debug: https://books.ropensci.org/targets/debugging.html
• Help: https://books.ropensci.org/targets/help.html
analysis failed
do_analysis(process, fail = TRUE)
stop("analysis failed")
.handleSimpleError(function (condition) { state$error <- build_mess...
h(simpleError(msg, call))
library(targets)
source("functions.R")
list(
tar_target(data_path_1, "data_1.csv", format = "file"),
tar_target(data_path_2, "data_2.csv", format = "file"),
tar_target(data_1, read_data(data_path_1)),
tar_target(data_2, read_data(data_path_2)),
tar_target(process_1, run_process_1(data_1)),
tar_target(process_2, run_process_2(data_2)),
tar_target(report, make_report(process_1, process_2))
)
▶ dispatched target data_path_1
● completed target data_path_1 [0 seconds, 194 bytes]
▶ dispatched target data_path_2
● completed target data_path_2 [0 seconds, 0 bytes]
▶ dispatched target data_1
● completed target data_1 [0 seconds, 44 bytes]
▶ dispatched target data_2
● completed target data_2 [0.001 seconds, 44 bytes]
▶ dispatched target process_1
● completed target process_1 [0.001 seconds, 44 bytes]
▶ dispatched target process_2
● completed target process_2 [0.001 seconds, 44 bytes]
▶ dispatched target report
● completed target report [0 seconds, 44 bytes]
▶ ended pipeline [0.072 seconds]
data_1
changes▶ dispatched target data_path_1
● completed target data_path_1 [0.001 seconds, 194 bytes]
✔ skipped target data_path_2
▶ dispatched target data_1
● completed target data_1 [0 seconds, 44 bytes]
✔ skipped target data_2
✔ skipped target process_1
✔ skipped target process_2
✔ skipped target report
▶ ended pipeline [0.07 seconds]
library(tarchetypes)
to get tar_quarto()
tar_quarto()
in the place of a tar_target()
tar_read()
or tar_load()
in your QMD filetar_read()
them into your QMD rather than including the ggplot code in your QMDlibrary(targets)
source("functions.R")
list(
tar_target(data, "data_1.csv", format = "file"),
tar_target(analysis_1, do_analysis(data, value = 1)),
tar_target(analysis_2, do_analysis(data, value = 2)),
tar_target(analysis_3, do_analysis(data, value = 3)),
tar_target(analysis_4, do_analysis(data, value = 4)),
tar_target(analysis_5, do_analysis(data, value = 5)),
tar_target(analysis_6, do_analysis(data, value = 6)),
tar_target(report, make_report(analysis_1, analysis_2, analysis_3, analysis_4, analysis_5, analysis_6))
)
map
, cross
, slice
, sample
and morelibrary(targets)
library(crew)
source("functions.R")
tar_option_set(
controller = crew_controller_local(workers = 2)
)
list(
tar_target(data, "data_1.csv", format = "file"),
tar_target(values, 1:6),
tar_target(analysis, do_analysis(data, values),
pattern = map(values)),
tar_target(report, make_report(analysis))
)
The user manual: https://books.ropensci.org/targets/