Reproducible Research Workflows in the Era of Big Data and Machine Learning

Workshop prepared for NOFE Young Investigator Day, 28.10.2025.

Source on codeberg.

Schedule

PART 1: “Why reproducibility?” (1 hr)

  • DISCUSSION: what is reproducibility? what tools do you know and which you already use? how to do it better?
  • NAMING THINGS: file organization, file names, file formats
  • Short hands-on
  • Check out the presentation

PART 2: “When to think about reproducibility?” (1hr)

  • Walk-through of a typical scientific project: how to make it reproducible at each step
  • Discussion and hands-on
  • Check out the presentation

PART 3: “How to do reproducible reports?” (1hr)

  • Quarto: create beautiful reproducible reports, programming language agnostic!
  • Hands-on

Check out the exemplary reports:

Requirements

To get the most of this workshop, you should:

  • have tried analysis of data
  • have tried writing about the results, creating a presentation or a poster
  • have been frustrated with the need to re-run analyses and keeping track of all the versions
  • be willing to change your habits to make your science more reproducible

Technical requirements

NB: We could not confirm the instructions for running STATA from Python, thus we do not recommend this way

  • PC
  • installed Quarto
  • installed R, Python, or STATA
  • installed VisualStudioCode or RStudio
  • if using STATA:
    • using STATA in Quarto requires R to setup, so install that also
    • then, open terminal (e.g., powershell in Windows or terminal in VScode) and write “R” and press Enter
    • now, in R, write: “install.packages(‘Statamarkdown’)” and press Enter
    • if the installation was successful, you’re ready to go!