Reproducible report

Author
Affiliation

Julia Romanowska

BIOS, UiB

Published

October 28, 2025

Introduction

The data is public and described on the webpage.

Let’s explore the dataset!

Data summary

We can look at summary of the data:

skim(strep)
Data summary
Name strep
Number of rows 107
Number of columns 12
_______________________
Column type frequency:
factor 8
logical 1
numeric 3
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
arm 0 1.00 FALSE 2 Str: 55, Con: 52
gender 0 1.00 FALSE 2 F: 59, M: 48
baseline_condition 0 1.00 FALSE 3 3_P: 54, 2_F: 37, 1_G: 16
baseline_temp 0 1.00 FALSE 6 4_>: 43, 3_1: 31, 2_9: 24, 1_<: 7
baseline_esr 1 0.99 FALSE 3 4_5: 65, 3_2: 36, 2_1: 5
baseline_cavitation 0 1.00 FALSE 2 yes: 62, no: 45
strep_resistance 0 1.00 FALSE 3 1_s: 65, 3_r: 34, 2_m: 8
radiologic_6m 0 1.00 FALSE 6 6_C: 32, 5_M: 23, 1_D: 18, 3_M: 17

Variable type: logical

skim_variable n_missing complete_rate mean count
improved 0 1 0.51 TRU: 55, FAL: 52

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
patient_id 0 1 54.00 31.03 1 27.5 54 80.5 107 ▇▇▇▇▇
dose_strep_g 0 1 1.03 1.00 0 0.0 2 2.0 2 ▇▁▁▁▇
rad_num 0 1 3.93 1.89 1 2.0 5 6.0 6 ▇▅▁▆▇

The data has 107 rows and 12 columns. Some columns are numeric, but majority contain categorical values.

Missingness

Is the dataset complete?

gg_miss_var(strep)

Almost! There is only one missing value.

Methods

We will check whether the streptomycin use helped those with tuberculosis. We will use some visualizations and logistic regression.

Results

First, let’s check whether there was any difference between treatment arms in baseline measurements.

tbl_summary(
    strep %>%
      select(-rad_num, -improved, -strep_resistance, -patient_id, -dose_strep_g,
      -radiologic_6m),
    by = arm
  ) |>
  add_p()
Table 1: Baseline characteristics of the study population
Characteristic Control
N = 521
Streptomycin
N = 551
p-value2
gender

0.8
    F 28 (54%) 31 (56%)
    M 24 (46%) 24 (44%)
baseline_condition

0.7
    1_Good 8 (15%) 8 (15%)
    2_Fair 20 (38%) 17 (31%)
    3_Poor 24 (46%) 30 (55%)
baseline_temp

0.8
    1_<=98.9F/37.2C 4 (7.7%) 3 (5.5%)
    2_99-99.9F/37.3-37.7C 12 (23%) 12 (22%)
    2_99-99.9F/37.3-37.7C/37.3-37.7C 0 (0%) 1 (1.8%)
    3_100-100.9F/37.8-38.2C 17 (33%) 14 (25%)
    3_100-100.9F/37.8-38.2C/37.8-38.2C 0 (0%) 1 (1.8%)
    4_>=101F/38.3C 19 (37%) 24 (44%)
baseline_esr

0.6
    2_11-20 2 (3.9%) 3 (5.5%)
    3_21-50 20 (39%) 16 (29%)
    4_51+ 29 (57%) 36 (65%)
    Unknown 1 0
baseline_cavitation 30 (58%) 32 (58%) >0.9
1 n (%)
2 Pearson’s Chi-squared test; Fisher’s exact test

There were no significant differences between the arms at the start of the trial.

Was the treatment successful?

model1 <- glm("improved ~ arm", data = strep)
model1_res <- model1 |> tidy()

The result of the simplest logistic regression model (improved ~ arm) showed that those who got streptomycin had improved more often (p-value 1.2^{-4}).

Did anyone got resistance to the antibiotic?

ggplot(strep) +
  aes(strep_resistance) +
  geom_bar(aes(fill = gender), position = position_dodge()) +
  facet_wrap(vars(arm)) +
  theme_minimal() +
  coord_flip() +
  labs(
    title = "Majority were resistant to streptomycin after 6 months of treatment"
  ) +
  ylab("number of persons") +
  theme(
    axis.title.y = element_blank()
  )

Summary

sessionInfo()
R version 4.5.1 (2025-06-13)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_DK.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_DK.UTF-8    
 [5] LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_DK.UTF-8   
 [7] LC_PAPER=en_DK.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Oslo
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] broom_1.0.8     skimr_2.1.5     naniar_1.1.0    gtsummary_2.2.0
 [5] here_1.0.1      lubridate_1.9.4 forcats_1.0.0   stringr_1.5.1  
 [9] dplyr_1.1.4     purrr_1.0.4     readr_2.1.5     tidyr_1.3.1    
[13] tibble_3.2.1    ggplot2_3.5.2   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gt_1.0.0          sass_0.4.10       generics_0.1.3    xml2_1.3.8       
 [5] stringi_1.8.7     hms_1.1.3         digest_0.6.37     magrittr_2.0.3   
 [9] evaluate_1.0.3    grid_4.5.1        timechange_0.3.0  cards_0.6.1      
[13] fastmap_1.2.0     rprojroot_2.0.4   jsonlite_2.0.0    cardx_0.2.5      
[17] backports_1.5.0   scales_1.3.0      cli_3.6.4         crayon_1.5.3     
[21] rlang_1.1.6       litedown_0.7      commonmark_1.9.5  bit64_4.6.0-1    
[25] munsell_0.5.1     base64enc_0.1-3   withr_3.0.2       repr_1.1.7       
[29] yaml_2.3.10       parallel_4.5.1    tools_4.5.1       tzdb_0.5.0       
[33] colorspace_2.1-1  vctrs_0.6.5       R6_2.6.1          lifecycle_1.0.4  
[37] bit_4.6.0         htmlwidgets_1.6.4 vroom_1.6.5       pkgconfig_2.0.3  
[41] pillar_1.10.2     gtable_0.3.6      glue_1.8.0        visdat_0.6.0     
[45] xfun_0.52         tidyselect_1.2.1  knitr_1.50        farver_2.1.2     
[49] htmltools_0.5.8.1 labeling_0.4.3    rmarkdown_2.29    compiler_4.5.1   
[53] markdown_2.0