Project walkthrough

Task

What would help make these steps reproducible?

  • receiving data (first look)
  • data exploration and cleaning
  • analysis
  • creating visualizations
  • preparing paper
  • submitting paper and data
05:30

Task

  • receiving data (first look)
    • copy and store original data untouched!
    • create folder structure
  • data exploration and cleaning
    • tidy data
    • keep track of tries
  • analysis
    • make scripts
    • keep notes and/or document scripts

Task

  • creating visualizations
    • use scripts, don’t change the graph by hand!
    • meaningful naming of figure files
  • preparing paper
    • reproducible reports: Quarto / Rmarkdown
  • submitting paper and data
    • save results in open file formats
    • tidy data!
    • create a repository, share, also meta-data

Reproducible research

  • analysis steps
  • inputs, outputs
  • code used to generate each figure
  • notes
  • versions of software, packages, etc.

Share all that can be shared!

Information is gold

  • data, data, data …

  • meta-information is equally important!

    • logs of changes
    • failed trials (not publishable, but very useful!)
    • documentation of code and datasets

Task

Which of these three tables is best to:

  • look at
  • re-use in research

Why?

Find the tables in data/table{1,2,3}

05:30

Task

Open a recent project, find some table with results and amend if needed:

  • create a table that will be easy to re-use: no captions, no explanations, data starts in the very first row and column, numbers are numbers, no merged columns, etc.

  • is it in a proprietary format? Change it to .csv!

  • does it have punctuation marks in column names? Fix the naming!

  • no accompanying text document (not Word!) explaining the meaning of the column names? Create it!

05:30

Book

Building reproducible analytical pipelines with R by Bruno Rodrigues

Quarto

Quarto logo

Quarto publishing system

  • alternative to Rmarkdown,
  • can execude code in R, python, Stata, JuliaLang