What is reproducibility?

What is reproducibility?

What is reproducibility?

Naming things

TASK

05:30

Open any project folder on your computer and show it to your neighbor. Let them guess what the project is about and where are the data, any scripts, and results.

File hierarchy

- My_Rproject
|-- Data
|----- input.txt
|----- cleaned_data.txt
|-- Scripts
|----- reading_data.R
|----- analysis.R
|----- extra_analysis.R
|-- Results
|----- results_table1.docx
|----- results_raw.xlsx
|-- Figures
|----- price_vs_house_size.png

File hierarchy

- My_Rproject
|-- README.txt
|-- Data
|----- input.txt
|----- cleaned_data.txt
|-- Scripts
|----- reading_data.R
|----- analysis.R
|----- extra_analysis.R
|-- Results
|----- results_table1.docx
|----- results_raw.xlsx
|-- Figures
|----- price_vs_house_size.png

Read me first!

README.txt file

  • should be in every project directory

  • includes:

    • short description of the purpose of the files
    • list of directories and files
    • description of files

TASK

Open a newer project folder.

Add a README file!

05:30

File naming

slides from Jenny Bryan

Three principles for creating names

  • machine readable
  • human readable
  • plays well with default ordering
NO YES
myabstract.docx 2014-06-08_abstract-for-sla.docx
Joe’s Filenames Use Spaces and Punctuation.xlsx joes-filenames-are-getting-better.xlsx
figure 1.png fig01_scatterplot-talk-length-vs-interest.png
JW7d^(2sl@deletethisandyourcareerisoverWx2*.txt 1986-01-28_raw-data-from-challenger-o-rings.txt

File formats

  • proprietary: .docx, .xlsx, .sas, …
  • open: .odt, .csv, .txt, .qmd, .rmd

TASK

How can we improve these names?

- My_Rproject
|-- README.txt
|-- Data
|----- input.txt
|----- cleaned_data.txt
|-- Scripts
|----- reading_data.R
|----- analysis.R
|----- extra_analysis.R
|-- Results
|----- results_table1.docx
|----- results_raw.xlsx
|-- Figures
|----- price_vs_house_size.png
05:30
- My_Rproject
|-- README.txt
|-- Data
|----- 00_orig_data.txt
|----- 01_cleaned_data.txt
|----- 02_cleaned_only_cases.csv
|-- Scripts
|----- 01_reading_data.R
|----- 02_descriptive_analysis.R
|----- 03_regression_analysis.R
|-- Results
|----- 02_table1_pop_characteristics.csv
|----- 03_regr_res_price_vs_house_size.csv
|-- Figures
|----- 03_price_vs_house_size_scatterplot.png

Variables naming

Variables naming

tmp <- sample(1:100, 50)
m <- mean(tmp)
r <- sqrt(sum((m - tmp)^2))

What does this code do and why?

# check deviation from the mean
x_sample <- sample(1:100, 50)
sample_mean <- mean(x_sample)
sum_dev <- sqrt(sum((sample_mean - x_sample)^2))

Variables naming

  • use meaningful variable names

  • choose one naming scheme and stick to it