Reproducible Research Workflows in the Era of Big Data and Machine Learning
Workshop prepared for NOFE Young Investigator Day, 28.10.2025.
Schedule
PART 1: “Why reproducibility?” (1 hr)
- DISCUSSION: what is reproducibility? what tools do you know and which you already use? how to do it better?
- NAMING THINGS: file organization, file names, file formats
- Short hands-on
- Check out the presentation
PART 2: “When to think about reproducibility?” (1hr)
- Walk-through of a typical scientific project: how to make it reproducible at each step
- Discussion and hands-on
- Check out the presentation
PART 3: “How to do reproducible reports?” (1hr)
- Quarto: create beautiful reproducible reports, programming language agnostic!
- Hands-on
Check out the exemplary reports:
Requirements
To get the most of this workshop, you should:
- have tried analysis of data
- have tried writing about the results, creating a presentation or a poster
- have been frustrated with the need to re-run analyses and keeping track of all the versions
- be willing to change your habits to make your science more reproducible
Technical requirements
NB: We could not confirm the instructions for running STATA from Python, thus we do not recommend this way
- PC
- installed Quarto
- installed R, Python, or STATA
- installed VisualStudioCode or RStudio
- if using STATA:
- using STATA in Quarto requires R to setup, so install that also
- then, open terminal (e.g., powershell in Windows or terminal in VScode) and write “R” and press Enter
- now, in R, write: “install.packages(‘Statamarkdown’)” and press Enter
- if the installation was successful, you’re ready to go!