Getting started with prodigenr
Luke W. Johnston
2024-12-14
Source:vignettes/prodigenr.Rmd
prodigenr.Rmd
prodigenr, or project directory generator, was designed to simplify the process of creating new scientific data analysis projects and to help make your workflow more reproducible and open from the beginning. While creating individual folders and files manually for new projects doesn’t take too much time, over time and over many researchers, this can quickly add up to a lot of time! Plus, when a standard structure is followed it makes it easier to share code and establish reproducible practices earlier on in the project.
Setting up a project with prodigenr
Starting a research project? Create a project directory like so:
library(prodigenr)
# Create a temporary folder using the fs package
new_project_path <- fs::path_temp("HeartDiseaseExercise")
setup_project(new_project_path)
Or via RStudio’s “New Project” interface (with RStudio version >1.1).
The resulting file structure should look something like this:
[01;34mHeartDiseaseExercise
[0m
├──
[01;34m.git
[0m
│ ├── HEAD
│ ├── config
│ ├── description
│ ├──
[01;34mhooks
[0m
│ │ └──
[01;32mREADME.sample
[0m
│ ├──
[01;34minfo
[0m
│ │ └── exclude
│ ├──
[01;34mobjects
[0m
│ │ ├──
[01;34minfo
[0m
│ │ └──
[01;34mpack
[0m
│ └──
[01;34mrefs
[0m
│ ├──
[01;34mheads
[0m
│ └──
[01;34mtags
[0m
├── .gitignore
├── DESCRIPTION
├── HeartDiseaseExercise.Rproj
├──
[01;34mR
[0m
│ └── README.md
├── README.md
├── TODO.md
├──
[01;34mdata
[0m
│ └── README.md
├──
[01;34mdata-raw
[0m
│ └── README.md
└──
[01;34mdocs
[0m
└── README.md
README.md
files are contained within each project and in
each folder that explains a bit more about what each folder and file are
used for, which is briefly described here:
-
R/
: Should contain the R scripts and functions used for the analysis. -
docs/
: Should contain the files related to presenting the project’s scientific output. Already has the report/manuscript inside. -
data/
: If relevant, is where the processed (or simulated) data is kept that is used for the project as well as the results of the project’s analyses. -
data-raw/
: If relevant, is where the scripts that process the raw data into the usable data are kept and, optionally where the raw data is also kept. -
DESCRIPTION
: Is a standard file that includes metadata about your project, in a machine readable format, and that also stores a list of the R packages your project depends on. -
.Rproj
: Is a standard file used by RStudio to set some R Project specific settings.
To add a new document (e.g. slides, manuscript), run any of the
create_*()
commands (e.g. create_slides()
) in
the console while in RStudio in the newly created project (via the
.Rproj
file):
# you need to run these in the project's console
create_slides()
#> ✔ Created the docs/slides.qmd!
Now two more files have been added to the docs/
folder.
The resulting file structure should look something like this:
[01;34mHeartDiseaseExercise
[0m
├── DESCRIPTION
├── HeartDiseaseExercise.Rproj
├──
[01;34mR
[0m
│ └── README.md
├── README.md
├── TODO.md
├──
[01;34mdata
[0m
│ └── README.md
├──
[01;34mdata-raw
[0m
│ └── README.md
└──
[01;34mdocs
[0m
├── README.md
└── slides.qmd
At present, there are only two template files that you can view:
template_list
#> [1] "report" "slides"
These template files are what an academic researcher likely typically encounters. However, if you have a suggestion or want to add a template, please create a GitHub issue or submit a Pull Request!
The end goal of each project is to be as self contained as possible. So that if you ever need to go back to the analysis, it is easy to re-run the code and get the results that you say you got. This is especially useful if others such as reviewers ask for something or want to confirm your results. See the manifesto for more details on the underlying philosophy behind this package.
Related packages or projects
There are several ways of handling a project. There a few packages
that have similar functionality as prodigenr
package
structure:
-
ProjectTemplate
is well documented and still actively developed. Only downside is that it is fairly complicated to use and complex in the project workflow it creates. -
makeProject
is very simple and stripped down in what it creates and in it’s use. Downside is that it wasn’t updated since 2012. - Use of the R package structure via
devtools
(orusethis
), which is argued for in this blog and compared toProjectTemplate
in this blog). -
rrtools
is very similar to prodigenr, except it focuses only on manuscripts. Is well thought out and the documentation is well written. -
workflowr
is a workflow for creating online, data science content.
There is also a list of other similar projects on the rOpenSci GitHub repository. It’s up to you to decide which style to use.