Introduction to the msSPChelpR package - from long dataset to SIR analyses

Marian Eberl

26 October 2020

Introduction

This vignette explains how to use the functions:

For some functions there are multiple variants of the same function using varying frameworks. They give the same results but will differ in execution time and memory use:

Theory behind SIRs

In the next version of this vignette the theoretical considerations how SIRs are calculated will be explained in this chapter.

Examples

SEER lung cancer

Step 1 - Long dataset

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)
library(msSPChelpR)
#Load synthetic dataset of patients with cancer to demonstrate package functions
data("us_second_cancer")

#This dataset is in long format, so each tumor is a separate row in the data
us_second_cancer
#> # A tibble: 113,999 × 16
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg … Male  White 1926-01-01 1992-07-15 C50        hist…
#>  2 100004        2 SEER Reg … Male  White 1926-01-01 2004-01-15 C54        hist…
#>  3 100004        3 SEER Reg … Male  White 1926-01-01 2006-06-15 C34        hist…
#>  4 100004        4 SEER Reg … Male  White 1926-01-01 2018-06-15 C14        DCO …
#>  5 100034        1 SEER Reg … Male  White 1979-01-01 2000-06-15 C50        hist…
#>  6 100037        1 SEER Reg … Fema… White 1938-01-01 1996-01-15 C54        hist…
#>  7 100038        1 SEER Reg … Male  White 1989-01-01 1991-04-15 C50        hist…
#>  8 100038        2 SEER Reg … Male  White 1989-01-01 2000-03-15 C80        hist…
#>  9 100039        1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50        hist…
#> 10 100039        2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34        hist…
#> # ℹ 113,989 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> #   p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>

Step 2 - Filter long dataset

#filter for lung cancer
ids <- us_second_cancer %>%
  #detect ids with any lung cancer
  filter(t_site_icd == "C34") %>%
  select(fake_id) %>%
  as.vector() %>%
  unname() %>%
  unlist()

filtered_usdata <- us_second_cancer %>%
  #filter according to above detected ids with any lung cancer diagnosis
  filter(fake_id %in% ids) %>%
  arrange(fake_id)

filtered_usdata
#> # A tibble: 62,661 × 16
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg … Male  White 1926-01-01 1992-07-15 C50        hist…
#>  2 100004        2 SEER Reg … Male  White 1926-01-01 2004-01-15 C54        hist…
#>  3 100004        3 SEER Reg … Male  White 1926-01-01 2006-06-15 C34        hist…
#>  4 100004        4 SEER Reg … Male  White 1926-01-01 2018-06-15 C14        DCO …
#>  5 100039        1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50        hist…
#>  6 100039        2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34        hist…
#>  7 100039        3 SEER Reg … Fema… White 1946-01-01 2018-01-15 C80        hist…
#>  8 100073        1 SEER Reg … Male  White 1960-01-01 1993-11-15 C44        hist…
#>  9 100073        2 SEER Reg … Male  White 1960-01-01 2003-12-15 C34        hist…
#> 10 100143        1 SEER Reg … Male  White 1944-01-01 1992-03-15 C50        hist…
#> # ℹ 62,651 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> #   p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>

Step 3 - Renumber time_id

renumbered_usdata <- filtered_usdata %>%
  renumber_time_id(new_time_id_var = "t_tumid", 
                   dattype = "seer",
                   case_id_var = "fake_id")

renumbered_usdata %>%
   select(fake_id, sex, t_site_icd, t_datediag, t_tumid)
#> # A tibble: 62,661 × 5
#>    fake_id sex    t_site_icd t_datediag t_tumid
#>    <chr>   <chr>  <chr>      <date>       <int>
#>  1 100004  Male   C50        1992-07-15       1
#>  2 100004  Male   C54        2004-01-15       2
#>  3 100004  Male   C34        2006-06-15       3
#>  4 100004  Male   C14        2018-06-15       4
#>  5 100039  Female C50        2003-08-15       1
#>  6 100039  Female C34        2011-04-15       2
#>  7 100039  Female C80        2018-01-15       3
#>  8 100073  Male   C44        1993-11-15       1
#>  9 100073  Male   C34        2003-12-15       2
#> 10 100143  Male   C50        1992-03-15       1
#> # ℹ 62,651 more rows

Step 4 - Reshape to wide dataset

usdata_wide <- renumbered_usdata %>%
  reshape_wide_tidyr(case_id_var = "fake_id", time_id_var = "t_tumid", timevar_max = 10)

#now the data is in the wide format as required by many package functions. 
#This means, each case is a row and several tumors per case ID are 
#add new columns to the data using the time_id as column name suffix.
usdata_wide
#> # A tibble: 31,997 × 136
#>    fake_id SEQ_NUM.1 registry.1            sex.1 race.1 datebirth.1 t_datediag.1
#>    <chr>       <int> <chr>                 <chr> <chr>  <date>      <date>      
#>  1 100004          1 SEER Reg 20 - Detroi… Male  White  1926-01-01  1992-07-15  
#>  2 100039          1 SEER Reg 02 - Connec… Fema… White  1946-01-01  2003-08-15  
#>  3 100073          1 SEER Reg 01 - San Fr… Male  White  1960-01-01  1993-11-15  
#>  4 100143          1 SEER Reg 02 - Connec… Male  White  1944-01-01  1992-03-15  
#>  5 100182          1 SEER Reg 02 - Connec… Male  Other  1927-01-01  1991-09-15  
#>  6 100197          1 SEER Reg 02 - Connec… Fema… White  1945-01-01  2012-06-15  
#>  7 100208          1 SEER Reg 02 - Connec… Male  White  1970-01-01  2019-11-15  
#>  8 100230          1 SEER Reg 01 - San Fr… Male  White  1947-01-01  1992-11-15  
#>  9 100234          1 SEER Reg 01 - San Fr… Male  White  1988-01-01  2010-02-15  
#> 10 100266          1 SEER Reg 01 - San Fr… Fema… White  1956-01-01  2010-07-15  
#> # ℹ 31,987 more rows
#> # ℹ 129 more variables: t_site_icd.1 <chr>, t_dco.1 <chr>, t_hist.1 <int>,
#> #   fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>, p_dodmin.1 <date>,
#> #   fc_agegroup.1 <chr>, t_yeardiag.1 <chr>, SEQ_NUM.2 <int>, registry.2 <chr>,
#> #   sex.2 <chr>, race.2 <chr>, datebirth.2 <date>, t_datediag.2 <date>,
#> #   t_site_icd.2 <chr>, t_dco.2 <chr>, t_hist.2 <int>, fc_age.2 <int>,
#> #   datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>, …

Step 5 - Recalculate p_spc


usdata_wide <- usdata_wide %>%
  dplyr::mutate(p_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ "No SPC",
                         !is.na(t_site_icd.2)           ~ "SPC developed",
                         TRUE ~ NA_character_)) %>%
  #create the same information as numeric variable count_spc
  dplyr::mutate(count_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ 1,
                            TRUE ~ 0))
usdata_wide %>%
   dplyr::select(fake_id, sex.1, p_spc, count_spc, t_site_icd.1, 
                 t_datediag.1, t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 × 8
#>    fake_id sex.1  p_spc         count_spc t_site_icd.1 t_datediag.1 t_site_icd.2
#>    <chr>   <chr>  <chr>             <dbl> <chr>        <date>       <chr>       
#>  1 100004  Male   SPC developed         0 C50          1992-07-15   C54         
#>  2 100039  Female SPC developed         0 C50          2003-08-15   C34         
#>  3 100073  Male   SPC developed         0 C44          1993-11-15   C34         
#>  4 100143  Male   SPC developed         0 C50          1992-03-15   C34         
#>  5 100182  Male   SPC developed         0 C18          1991-09-15   C34         
#>  6 100197  Female SPC developed         0 C34          2012-06-15   C50         
#>  7 100208  Male   No SPC                1 C34          2019-11-15   <NA>        
#>  8 100230  Male   SPC developed         0 C44          1992-11-15   C34         
#>  9 100234  Male   No SPC                1 C34          2010-02-15   <NA>        
#> 10 100266  Female No SPC                1 C34          2010-07-15   <NA>        
#> # ℹ 31,987 more rows
#> # ℹ 1 more variable: t_datediag.2 <date>

Step 6 - Determine patient status at end of FU

usdata_wide <- usdata_wide %>%
  pat_status(., fu_end = "2017-12-31", dattype = "seer",
             status_var = "p_status", life_var = "p_alive.1",
             spc_var = "p_spc", birthdat_var = "datebirth.1",
             lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
             spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
             life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
             spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
             use_lifedatmin = FALSE, check = TRUE, 
             as_labelled_factor = TRUE)
#> # A tibble: 10 × 3
#>    p_alive.1 p_status                                                          n
#>    <chr>     <fct>                                                         <int>
#>  1 Alive     Patient alive after FC (with or without following SPC after …  5986
#>  2 Alive     Patient alive after SPC                                       11421
#>  3 Alive     NA - Patient not born before end of FU                            4
#>  4 Alive     NA - Patient did not develop cancer before end of FU            873
#>  5 Dead      Patient alive after FC (with or without following SPC after …   909
#>  6 Dead      Patient alive after SPC                                        1294
#>  7 Dead      Patient dead after FC                                          6116
#>  8 Dead      Patient dead after SPC                                         5286
#>  9 Dead      NA - Patient did not develop cancer before end of FU             44
#> 10 Dead      NA - Patient date of death is missing                            64
#> # A tibble: 7 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6895
#> 2 Patient alive after SPC                                                12715
#> 3 Patient dead after FC                                                   6116
#> 4 Patient dead after SPC                                                  5286
#> 5 NA - Patient not born before end of FU                                     4
#> 6 NA - Patient did not develop cancer before end of FU                     917
#> 7 NA - Patient date of death is missing                                     64

usdata_wide %>%
   dplyr::select(fake_id, p_status, p_alive.1, datedeath.1, t_site_icd.1, t_datediag.1, 
                 t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 × 8
#>    fake_id p_status p_alive.1 datedeath.1 t_site_icd.1 t_datediag.1 t_site_icd.2
#>    <chr>   <fct>    <chr>     <date>      <chr>        <date>       <chr>       
#>  1 100004  Patient… Alive     NA          C50          1992-07-15   C54         
#>  2 100039  Patient… Alive     NA          C50          2003-08-15   C34         
#>  3 100073  Patient… Dead      2012-06-01  C44          1993-11-15   C34         
#>  4 100143  Patient… Alive     NA          C50          1992-03-15   C34         
#>  5 100182  Patient… Alive     NA          C18          1991-09-15   C34         
#>  6 100197  Patient… Alive     NA          C34          2012-06-15   C50         
#>  7 100208  NA - Pa… Dead      2019-11-15  C34          2019-11-15   <NA>        
#>  8 100230  Patient… Alive     NA          C44          1992-11-15   C34         
#>  9 100234  Patient… Alive     NA          C34          2010-02-15   <NA>        
#> 10 100266  Patient… Dead      2010-07-15  C34          2010-07-15   <NA>        
#> # ℹ 31,987 more rows
#> # ℹ 1 more variable: t_datediag.2 <date>

#alternatively, you can impute the date of death using lifedatmin_var
usdata_wide %>%
  pat_status(., fu_end = "2017-12-31", dattype = "seer",
             status_var = "p_status", life_var = "p_alive.1",
             spc_var = "p_spc", birthdat_var = "datebirth.1",
             lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
             spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
             life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
             spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
             use_lifedatmin = TRUE, lifedatmin_var = "p_dodmin.1", 
             check = TRUE, as_labelled_factor = TRUE)
#> # A tibble: 9 × 3
#>   p_alive.1 p_status                                                           n
#>   <chr>     <fct>                                                          <int>
#> 1 Alive     Patient alive after FC (with or without following SPC after e…  5986
#> 2 Alive     Patient alive after SPC                                        11421
#> 3 Alive     NA - Patient not born before end of FU                             4
#> 4 Alive     NA - Patient did not develop cancer before end of FU             873
#> 5 Dead      Patient alive after FC (with or without following SPC after e…   913
#> 6 Dead      Patient alive after SPC                                         1295
#> 7 Dead      Patient dead after FC                                           6138
#> 8 Dead      Patient dead after SPC                                          5323
#> 9 Dead      NA - Patient did not develop cancer before end of FU              44
#> # A tibble: 6 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6899
#> 2 Patient alive after SPC                                                12716
#> 3 Patient dead after FC                                                   6138
#> 4 Patient dead after SPC                                                  5323
#> 5 NA - Patient not born before end of FU                                     4
#> 6 NA - Patient did not develop cancer before end of FU                     917
#> # A tibble: 31,997 × 139
#>    fake_id SEQ_NUM.1 registry.1            sex.1 race.1 datebirth.1 t_datediag.1
#>    <chr>       <int> <chr>                 <chr> <chr>  <date>      <date>      
#>  1 100004          1 SEER Reg 20 - Detroi… Male  White  1926-01-01  1992-07-15  
#>  2 100039          1 SEER Reg 02 - Connec… Fema… White  1946-01-01  2003-08-15  
#>  3 100073          1 SEER Reg 01 - San Fr… Male  White  1960-01-01  1993-11-15  
#>  4 100143          1 SEER Reg 02 - Connec… Male  White  1944-01-01  1992-03-15  
#>  5 100182          1 SEER Reg 02 - Connec… Male  Other  1927-01-01  1991-09-15  
#>  6 100197          1 SEER Reg 02 - Connec… Fema… White  1945-01-01  2012-06-15  
#>  7 100208          1 SEER Reg 02 - Connec… Male  White  1970-01-01  2019-11-15  
#>  8 100230          1 SEER Reg 01 - San Fr… Male  White  1947-01-01  1992-11-15  
#>  9 100234          1 SEER Reg 01 - San Fr… Male  White  1988-01-01  2010-02-15  
#> 10 100266          1 SEER Reg 01 - San Fr… Fema… White  1956-01-01  2010-07-15  
#> # ℹ 31,987 more rows
#> # ℹ 132 more variables: t_site_icd.1 <chr>, t_dco.1 <chr>, t_hist.1 <int>,
#> #   fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>, p_dodmin.1 <date>,
#> #   fc_agegroup.1 <chr>, t_yeardiag.1 <chr>, SEQ_NUM.2 <int>, registry.2 <chr>,
#> #   sex.2 <chr>, race.2 <chr>, datebirth.2 <date>, t_datediag.2 <date>,
#> #   t_site_icd.2 <chr>, t_dco.2 <chr>, t_hist.2 <int>, fc_age.2 <int>,
#> #   datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>, …

Step 6b - Remove patients irrelevant to analysis depending on status

usdata_wide <- usdata_wide %>%
  dplyr::filter(!p_status %in% c("NA - Patient not born before end of FU",
                                 "NA - Patient did not develop cancer before end of FU",
                                 "NA - Patient date of death is missing"))

usdata_wide %>%
  dplyr::count(p_status)
#> # A tibble: 4 × 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6895
#> 2 Patient alive after SPC                                                12715
#> 3 Patient dead after FC                                                   6116
#> 4 Patient dead after SPC                                                  5286

Step 7 - Calculate FU time

usdata_wide <- usdata_wide %>%
   calc_futime(., futime_var_new = "p_futimeyrs", fu_end = "2017-12-31",
               dattype = "seer", time_unit = "years", 
               lifedat_var = "datedeath.1", 
               fcdat_var = "t_datediag.1", spcdat_var = "t_datediag.2")
#> # A tibble: 4 × 5
#>   p_status                       mean_futime min_futime max_futime median_futime
#>   <fct>                                <dbl>      <dbl>      <dbl>         <dbl>
#> 1 Patient alive after FC (with …        9.56     0.0438       27.0          8.29
#> 2 Patient alive after SPC               8.70     0            26.9          7.50
#> 3 Patient dead after FC                 8.60     0            25.9          7.54
#> 4 Patient dead after SPC                6.29     0            25.3          5.17

usdata_wide %>%
   dplyr::select(fake_id, p_status, p_futimeyrs, p_alive.1, datedeath.1, t_datediag.1, t_datediag.2)
#> # A tibble: 31,012 × 7
#>    fake_id p_status  p_futimeyrs p_alive.1 datedeath.1 t_datediag.1 t_datediag.2
#>    <chr>   <fct>           <dbl> <chr>     <date>      <date>       <date>      
#>  1 100004  Patient …       11.5  Alive     NA          1992-07-15   2004-01-15  
#>  2 100039  Patient …        7.67 Alive     NA          2003-08-15   2011-04-15  
#>  3 100073  Patient …       10.1  Dead      2012-06-01  1993-11-15   2003-12-15  
#>  4 100143  Patient …        3.33 Alive     NA          1992-03-15   1995-07-15  
#>  5 100182  Patient …        7.08 Alive     NA          1991-09-15   1998-10-15  
#>  6 100197  Patient …        4.83 Alive     NA          2012-06-15   2017-04-15  
#>  7 100230  Patient …       11.0  Alive     NA          1992-11-15   2003-11-15  
#>  8 100234  Patient …        7.87 Alive     NA          2010-02-15   NA          
#>  9 100266  Patient …        0    Dead      2010-07-15  2010-07-15   NA          
#> 10 100274  Patient …        7.38 Dead      2011-06-01  2004-01-15   NA          
#> # ℹ 31,002 more rows

Step 8 - Calculate SIR

sircalc_results <- usdata_wide %>%
  sir_byfutime(
    dattype = "seer",
    ybreak_vars = c("race.1", "t_dco.1"),
    xbreak_var = "none",
    futime_breaks = c(0, 1/12, 2/12, 1, 5, 10, Inf),
    count_var = "count_spc",
    refrates_df = us_refrates_icd2,
    calc_total_row = TRUE,
    calc_total_fu = TRUE,
    region_var = "registry.1",
    age_var = "fc_agegroup.1",
    sex_var = "sex.1",
    year_var = "t_yeardiag.1",
    race_var = "race.1",
    site_var = "t_site_icd.1", #using grouping by second cancer incidence
    futime_var = "p_futimeyrs",
    alpha = 0.05)
#> 
Calculating SIR ■■■■■■                            18% | ETA:  5s

Calculating SIR ■■■■■■■■                          23% | ETA:  5s

Calculating SIR ■■■■■■■■■                         27% | ETA:  4s

Calculating SIR ■■■■■■■■■■■                       32% | ETA:  4s

Calculating SIR ■■■■■■■■■■■■                      36% | ETA:  4s

Calculating SIR ■■■■■■■■■■■■■                     41% | ETA:  4s

Calculating SIR ■■■■■■■■■■■■■■■                   45% | ETA:  3s

Calculating SIR ■■■■■■■■■■■■■■■■                  50% | ETA:  3s

Calculating SIR ■■■■■■■■■■■■■■■■■                 55% | ETA:  3s

Calculating SIR ■■■■■■■■■■■■■■■■■■■               59% | ETA:  2s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■              64% | ETA:  2s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■             68% | ETA:  2s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■           73% | ETA:  2s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■          77% | ETA:  1s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■        82% | ETA:  1s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■       86% | ETA:  1s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■■      91% | ETA:  1s

Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■    95% | ETA:  0s

                                                                 
[INFO Cases 0 PYARs] There are conflicts where strata with 0 follow-up time have data in observed.
#> ℹ 30 strata are affected.
#>  - This might be caused by cases where SPC occured at the same day as first cancer.
#>  - You can check this by excluding all cases from wide_df, where date of first diagnosis is equal.
#> ! Check attribute `problems_not_empty` of results to see what strata are affected.
#>  [INFO Unexpected Cases] There are observed cases in the results file that do not occur in the refrates_df.
#> ℹ 2665 strata are affected.
#> A possible explanation can be:
#>  - DCO cases or
#>  - diagnosis of second cancer occured in different time period than first cancer
#> ! Check attribute `notes_refcases` of results to see what strata are affected.
#> 

sircalc_results %>% print(n = 100)
#> # A tidytable: 421,430 × 22
#>     age    region sex   race  year  yvar_name yvar_label fu_time t_site observed
#>     <chr>  <chr>  <chr> <chr> <chr> <chr>     <chr>      <chr>   <chr>     <dbl>
#>   1 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C14           0
#>   2 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C18           0
#>   3 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C34           0
#>   4 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C44           0
#>   5 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C50           0
#>   6 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C54           0
#>   7 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C64           0
#>   8 00 - … SEER … Fema… Black 1990… total_var Overall    to 1 m… C80           0
#>   9 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C14           0
#>  10 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C18           0
#>  11 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C34           0
#>  12 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C44           0
#>  13 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C50           0
#>  14 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C54           0
#>  15 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C64           0
#>  16 00 - … SEER … Fema… Black 1990… total_var Overall    0.0833… C80           0
#>  17 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C14           0
#>  18 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C18           0
#>  19 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C34           0
#>  20 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C44           0
#>  21 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C50           0
#>  22 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C54           0
#>  23 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C64           0
#>  24 00 - … SEER … Fema… Black 1990… total_var Overall    0.167-… C80           0
#>  25 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C14           0
#>  26 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C18           0
#>  27 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C34           0
#>  28 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C44           0
#>  29 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C50           0
#>  30 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C54           0
#>  31 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C64           0
#>  32 00 - … SEER … Fema… Black 1990… total_var Overall    1-5 ye… C80           0
#>  33 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C14           0
#>  34 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C18           0
#>  35 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C34           0
#>  36 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C44           0
#>  37 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C50           0
#>  38 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C54           0
#>  39 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C64           0
#>  40 00 - … SEER … Fema… Black 1990… total_var Overall    5-10 y… C80           0
#>  41 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C14           0
#>  42 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C18           0
#>  43 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C34           1
#>  44 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C44           0
#>  45 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C50           0
#>  46 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C54           0
#>  47 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C64           0
#>  48 00 - … SEER … Fema… Black 1990… total_var Overall    10+ ye… C80           0
#>  49 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C14           0
#>  50 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C18           0
#>  51 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C34           1
#>  52 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C44           0
#>  53 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C50           0
#>  54 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C54           0
#>  55 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C64           0
#>  56 00 - … SEER … Fema… Black 1990… total_var Overall    Total … C80           0
#>  57 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C14           0
#>  58 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C18           0
#>  59 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C34           0
#>  60 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C44           0
#>  61 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C50           0
#>  62 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C54           0
#>  63 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C64           0
#>  64 00 - … SEER … Fema… Black 1990… race.1    Black      to 1 m… C80           0
#>  65 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C14           0
#>  66 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C18           0
#>  67 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C34           0
#>  68 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C44           0
#>  69 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C50           0
#>  70 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C54           0
#>  71 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C64           0
#>  72 00 - … SEER … Fema… Black 1990… race.1    Black      0.0833… C80           0
#>  73 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C14           0
#>  74 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C18           0
#>  75 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C34           0
#>  76 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C44           0
#>  77 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C50           0
#>  78 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C54           0
#>  79 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C64           0
#>  80 00 - … SEER … Fema… Black 1990… race.1    Black      0.167-… C80           0
#>  81 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C14           0
#>  82 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C18           0
#>  83 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C34           0
#>  84 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C44           0
#>  85 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C50           0
#>  86 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C54           0
#>  87 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C64           0
#>  88 00 - … SEER … Fema… Black 1990… race.1    Black      1-5 ye… C80           0
#>  89 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C14           0
#>  90 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C18           0
#>  91 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C34           0
#>  92 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C44           0
#>  93 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C50           0
#>  94 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C54           0
#>  95 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C64           0
#>  96 00 - … SEER … Fema… Black 1990… race.1    Black      5-10 y… C80           0
#>  97 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C14           0
#>  98 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C18           0
#>  99 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C34           1
#> 100 00 - … SEER … Fema… Black 1990… race.1    Black      10+ ye… C44           0
#> # ℹ 421,330 more rows
#> # ℹ 12 more variables: expected <dbl>, sir <dbl>, sir_lci <dbl>, sir_uci <dbl>,
#> #   pyar <dbl>, n_base <dbl>, ref_inc_cases <dbl>, ref_population_pyar <dbl>,
#> #   ref_inc_crude_rate <dbl>, fu_time_sort <int>, yvar_sort <int>,
#> #   warning <chr>

Step 9 - Summarize SIR results

#The summarize function is versatile. Here for example the summary with minimal output

sircalc_results %>%
  #summarize results across region, age, year and t_site
  summarize_sir_results(.,
                        summarize_groups = c("region", "age", "year", "race"),
                        summarize_site = TRUE,
                        output = "long",  output_information = "minimal",
                        add_total_row = "only",  add_total_fu = "no",
                        collapse_ci = FALSE,  shorten_total_cols = TRUE,
                        fubreak_var_name = "fu_time", ybreak_var_name = "yvar_name",
                        xbreak_var_name = "none", site_var_name = "t_site",
                        alpha = 0.05
                        ) %>%
  dplyr::select(-region, -age, -year, -race, -sex, -yvar_name)
#> Warning: The results file `sir_df` contains observed cases in i_observed that do not occur in the refrates_df (ref_inc_cases).
#> Therefore calculation of the variables n_base and ref_population_pyar is ambiguous.
#> We take the first value of each variable. Expect small inconsistencies in the calculation of n_base, ref_population_pyar and ref_inc_crude_rate across strata.
#> ! If you want to know more, please check the `warnings` column of `sir_df`.
#> # A tidytable: 7 × 8
#>   yvar_label fu_time          fu_time_sort t_site observed expected   sir sir_ci
#>   <chr>      <chr>                   <int> <chr>     <dbl>    <dbl> <dbl> <chr> 
#> 1 Overall    to 1 month                  1 Total       306     20.6 14.9  13.25…
#> 2 Overall    0.0833-0.167 ye…            2 Total        74     20.4  3.62 2.84 …
#> 3 Overall    0.167-1 years               3 Total       717    196.   3.65 3.39 …
#> 4 Overall    1-5 years                   4 Total      2995    760.   3.94 3.8 -…
#> 5 Overall    5-10 years                  5 Total      3113    605.   5.14 4.96 …
#> 6 Overall    10+ years                   6 Total      4254    502.   8.47 8.22 …
#> 7 Overall    Total 0 to Inf …            7 Total     11459   2105.   5.44 5.34 …

Built with

sessionInfo()
#> R version 4.3.2 (2023-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: Europe/Berlin
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] msSPChelpR_0.9.1 magrittr_2.0.3   dplyr_1.1.4     
#> 
#> loaded via a namespace (and not attached):
#>  [1] jsonlite_1.8.8     compiler_4.3.2     tidyselect_1.2.0   stringr_1.5.1     
#>  [5] tidytable_0.10.2   tidyr_1.3.0        jquerylib_0.1.4    yaml_2.3.8        
#>  [9] fastmap_1.1.1      R6_2.5.1           generics_0.1.3     sjlabelled_1.2.0  
#> [13] knitr_1.45         forcats_1.0.0      tibble_3.2.1       insight_0.19.7    
#> [17] lubridate_1.9.3    bslib_0.6.1        pillar_1.9.0       rlang_1.1.3       
#> [21] utf8_1.2.4         stringi_1.8.3      cachem_1.0.8       xfun_0.41         
#> [25] sass_0.4.8         timechange_0.2.0   cli_3.6.2          withr_3.0.0       
#> [29] digest_0.6.34      rstudioapi_0.15.0  haven_2.5.4        hms_1.1.3         
#> [33] lifecycle_1.0.4    vctrs_0.6.5        data.table_1.14.10 evaluate_0.23     
#> [37] glue_1.7.0         fansi_1.0.6        rmarkdown_2.25     purrr_1.0.2       
#> [41] tools_4.3.2        pkgconfig_2.0.3    htmltools_0.5.7