This vignette explains how to use the functions:
calc_futime()
to calculate follow-up time from index
event until next event, death or end of follow-up datepat_status()
to determine patient status at end of
follow-uprenumber_time_id()
to calculate a consecutive index of
events per case IDreshape_long()
to transpose dataset in wide format to
data in long formatreshape_wide()
to transpose dataset in long format to
data in wide format (the wide format is required for many package
functions)sir_byfutime()
to calculate standardized incidence
ratios (SIRs) with custom grouping variables stratified by follow-up
timesummarize_sir_results()
to summarize detailed SIR
results produced by sir_byfutime()
vital_status()
to determine vital status whether
patient is alive or dead at end of follow-upFor some functions there are multiple variants of the same function using varying frameworks. They give the same results but will differ in execution time and memory use:
It is recommended to run the following steps in the correct order to obtain accurate follow-up time calculations
Filter all cases in the long version of the dataset that are relevant for your analysis. Make sure that:
case_id
the index event (e.g. First Cancer FC)
is still included and is the one remaining row in the dataset with the
smallest case_id
(TUMID3
variable for ZfKD
data, and SEQ_NUM
for SEER data)case_id
s might or might not get a countable
incident event (e.g. Second Primary Cancer SPC). This event should be
the second entry per case_id
(second smallest
case_id
) if it is to be countedcount_var
should indicate
whether the countable incident event (SPC) has occurred or not. Coded
0
for non-occurrence (or not counted event) and
1
for a counted incident event.Renumber filtered long dataset: In
the filter long dataset, you should run the helper function
msSPChelpR::renumber_time_id_dt()
(or non-data.table
variant msSPChelpR::renumber_time_id()
) that will renumber
all events per case_id
and (if step 1 is fulfilled) will
assign each index event with time_var_new = 1
and each
second (possibly countable incident event) with
time_var_new = 2
. Any SIR related function will only count
the second event, if additionally to time_var_new = 2
for
this row also count_var = 1
is true.
Reshape dataset: Run
msSPChelpR::reshape_wide_dt()
or non-data.table-variant
msSPChelpR::reshape_wide()
, so that dataset is transposed
to wide format (1 row per case_id
, creating variables such
as count_var.2
).
Set flag for Second Primary Cancer
diagnosis: After filtering and reshaping it is essential to set
p_spc
again. This variable will be used by later steps of
the analysis.
Determine patient status at a
defined end of follow-up by using the
msSPChelpR::pat_status()
function. This date for end of
follow-up must:
be in “YYYY-MM-DD” format and is always defined via the
fu_end =
parameter
must precede the end of data collection. E.g. if the last
incident events for the dataset you are using are collected at the end
of 2014, your fu_end
must be
fu_end = "2014-12-15"
or earlier.
Based on the newly calculated patient status, you might want to exclude cases for which patient status cannot be determined
msSPChelpR::calc_futime()
function and the same fu_end
as for step 6. By standard all
functions of the msSPChelpR
package require follow-up times
as numeric years.In order to calculate SIR using the package functions, the following
data structure is needed: * Wide format data wide_df
with
one row per patient that has encountered the index event (i.e. diagnosed
with a first primary cancer FC)
wide_df
needs to contain the following
variables (columns) per patient (row):
region_var
- variable in df that contains information
on region where case was incident.agegroup_var
- variable in df that contains information
on age-group.sex_var
- variable in df that contains information on
biological sex.year_var
- variable in df that contains information on
year or year-period when case was incident.site_var
- variable in df that contains information on
case (count event) diagnosis. Cases are usually the second cancers.
Diagnoses can use any coding system (e.g. ICD) but coding system between
dataset and reference data must be coherent.futime_var
- variable in df that contains follow-up
time per person between date of first cancer and any of death, date of
event (case), end of FU date (in years; whatever event comes first). In
case you have not calculated the FU time yet, you can use the workflow
described in the previous chapter.If your data has the required structure, you can calculate and summarize SIRs with the following two steps:
msSPChelpR::sir_byfutime()
function. For this calculation
usually a reference dataset is required that defines the population
standard rates. refrates_df
must use the same category
coding of age, sex, region, year and cancer_site as
agegroup_var
, sex_var
,
region_var
, year_var
and
site_var
msSPChelpR::summarize_sir_results()
function on the
stratified sir results produced by the previous step.In the next version of this vignette the theoretical considerations how SIRs are calculated will be explained in this chapter.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
library(msSPChelpR)
#Load synthetic dataset of patients with cancer to demonstrate package functions
data("us_second_cancer")
#This dataset is in long format, so each tumor is a separate row in the data
us_second_cancer
#> # A tibble: 113,999 × 16
#> fake_id SEQ_NUM registry sex race datebirth t_datediag t_site_icd t_dco
#> <chr> <int> <chr> <chr> <chr> <date> <date> <chr> <chr>
#> 1 100004 1 SEER Reg … Male White 1926-01-01 1992-07-15 C50 hist…
#> 2 100004 2 SEER Reg … Male White 1926-01-01 2004-01-15 C54 hist…
#> 3 100004 3 SEER Reg … Male White 1926-01-01 2006-06-15 C34 hist…
#> 4 100004 4 SEER Reg … Male White 1926-01-01 2018-06-15 C14 DCO …
#> 5 100034 1 SEER Reg … Male White 1979-01-01 2000-06-15 C50 hist…
#> 6 100037 1 SEER Reg … Fema… White 1938-01-01 1996-01-15 C54 hist…
#> 7 100038 1 SEER Reg … Male White 1989-01-01 1991-04-15 C50 hist…
#> 8 100038 2 SEER Reg … Male White 1989-01-01 2000-03-15 C80 hist…
#> 9 100039 1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50 hist…
#> 10 100039 2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34 hist…
#> # ℹ 113,989 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> # p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>
#filter for lung cancer
ids <- us_second_cancer %>%
#detect ids with any lung cancer
filter(t_site_icd == "C34") %>%
select(fake_id) %>%
as.vector() %>%
unname() %>%
unlist()
filtered_usdata <- us_second_cancer %>%
#filter according to above detected ids with any lung cancer diagnosis
filter(fake_id %in% ids) %>%
arrange(fake_id)
filtered_usdata
#> # A tibble: 62,661 × 16
#> fake_id SEQ_NUM registry sex race datebirth t_datediag t_site_icd t_dco
#> <chr> <int> <chr> <chr> <chr> <date> <date> <chr> <chr>
#> 1 100004 1 SEER Reg … Male White 1926-01-01 1992-07-15 C50 hist…
#> 2 100004 2 SEER Reg … Male White 1926-01-01 2004-01-15 C54 hist…
#> 3 100004 3 SEER Reg … Male White 1926-01-01 2006-06-15 C34 hist…
#> 4 100004 4 SEER Reg … Male White 1926-01-01 2018-06-15 C14 DCO …
#> 5 100039 1 SEER Reg … Fema… White 1946-01-01 2003-08-15 C50 hist…
#> 6 100039 2 SEER Reg … Fema… White 1946-01-01 2011-04-15 C34 hist…
#> 7 100039 3 SEER Reg … Fema… White 1946-01-01 2018-01-15 C80 hist…
#> 8 100073 1 SEER Reg … Male White 1960-01-01 1993-11-15 C44 hist…
#> 9 100073 2 SEER Reg … Male White 1960-01-01 2003-12-15 C34 hist…
#> 10 100143 1 SEER Reg … Male White 1944-01-01 1992-03-15 C50 hist…
#> # ℹ 62,651 more rows
#> # ℹ 7 more variables: t_hist <int>, fc_age <int>, datedeath <date>,
#> # p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>, t_yeardiag <chr>
time_id
renumbered_usdata <- filtered_usdata %>%
renumber_time_id(new_time_id_var = "t_tumid",
dattype = "seer",
case_id_var = "fake_id")
renumbered_usdata %>%
select(fake_id, sex, t_site_icd, t_datediag, t_tumid)
#> # A tibble: 62,661 × 5
#> fake_id sex t_site_icd t_datediag t_tumid
#> <chr> <chr> <chr> <date> <int>
#> 1 100004 Male C50 1992-07-15 1
#> 2 100004 Male C54 2004-01-15 2
#> 3 100004 Male C34 2006-06-15 3
#> 4 100004 Male C14 2018-06-15 4
#> 5 100039 Female C50 2003-08-15 1
#> 6 100039 Female C34 2011-04-15 2
#> 7 100039 Female C80 2018-01-15 3
#> 8 100073 Male C44 1993-11-15 1
#> 9 100073 Male C34 2003-12-15 2
#> 10 100143 Male C50 1992-03-15 1
#> # ℹ 62,651 more rows
usdata_wide <- renumbered_usdata %>%
reshape_wide_tidyr(case_id_var = "fake_id", time_id_var = "t_tumid", timevar_max = 10)
#now the data is in the wide format as required by many package functions.
#This means, each case is a row and several tumors per case ID are
#add new columns to the data using the time_id as column name suffix.
usdata_wide
#> # A tibble: 31,997 × 136
#> fake_id SEQ_NUM.1 registry.1 sex.1 race.1 datebirth.1 t_datediag.1
#> <chr> <int> <chr> <chr> <chr> <date> <date>
#> 1 100004 1 SEER Reg 20 - Detroi… Male White 1926-01-01 1992-07-15
#> 2 100039 1 SEER Reg 02 - Connec… Fema… White 1946-01-01 2003-08-15
#> 3 100073 1 SEER Reg 01 - San Fr… Male White 1960-01-01 1993-11-15
#> 4 100143 1 SEER Reg 02 - Connec… Male White 1944-01-01 1992-03-15
#> 5 100182 1 SEER Reg 02 - Connec… Male Other 1927-01-01 1991-09-15
#> 6 100197 1 SEER Reg 02 - Connec… Fema… White 1945-01-01 2012-06-15
#> 7 100208 1 SEER Reg 02 - Connec… Male White 1970-01-01 2019-11-15
#> 8 100230 1 SEER Reg 01 - San Fr… Male White 1947-01-01 1992-11-15
#> 9 100234 1 SEER Reg 01 - San Fr… Male White 1988-01-01 2010-02-15
#> 10 100266 1 SEER Reg 01 - San Fr… Fema… White 1956-01-01 2010-07-15
#> # ℹ 31,987 more rows
#> # ℹ 129 more variables: t_site_icd.1 <chr>, t_dco.1 <chr>, t_hist.1 <int>,
#> # fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>, p_dodmin.1 <date>,
#> # fc_agegroup.1 <chr>, t_yeardiag.1 <chr>, SEQ_NUM.2 <int>, registry.2 <chr>,
#> # sex.2 <chr>, race.2 <chr>, datebirth.2 <date>, t_datediag.2 <date>,
#> # t_site_icd.2 <chr>, t_dco.2 <chr>, t_hist.2 <int>, fc_age.2 <int>,
#> # datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>, …
p_spc
usdata_wide <- usdata_wide %>%
dplyr::mutate(p_spc = dplyr::case_when(is.na(t_site_icd.2) ~ "No SPC",
!is.na(t_site_icd.2) ~ "SPC developed",
TRUE ~ NA_character_)) %>%
#create the same information as numeric variable count_spc
dplyr::mutate(count_spc = dplyr::case_when(is.na(t_site_icd.2) ~ 1,
TRUE ~ 0))
usdata_wide %>%
dplyr::select(fake_id, sex.1, p_spc, count_spc, t_site_icd.1,
t_datediag.1, t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 × 8
#> fake_id sex.1 p_spc count_spc t_site_icd.1 t_datediag.1 t_site_icd.2
#> <chr> <chr> <chr> <dbl> <chr> <date> <chr>
#> 1 100004 Male SPC developed 0 C50 1992-07-15 C54
#> 2 100039 Female SPC developed 0 C50 2003-08-15 C34
#> 3 100073 Male SPC developed 0 C44 1993-11-15 C34
#> 4 100143 Male SPC developed 0 C50 1992-03-15 C34
#> 5 100182 Male SPC developed 0 C18 1991-09-15 C34
#> 6 100197 Female SPC developed 0 C34 2012-06-15 C50
#> 7 100208 Male No SPC 1 C34 2019-11-15 <NA>
#> 8 100230 Male SPC developed 0 C44 1992-11-15 C34
#> 9 100234 Male No SPC 1 C34 2010-02-15 <NA>
#> 10 100266 Female No SPC 1 C34 2010-07-15 <NA>
#> # ℹ 31,987 more rows
#> # ℹ 1 more variable: t_datediag.2 <date>
usdata_wide <- usdata_wide %>%
pat_status(., fu_end = "2017-12-31", dattype = "seer",
status_var = "p_status", life_var = "p_alive.1",
spc_var = "p_spc", birthdat_var = "datebirth.1",
lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
use_lifedatmin = FALSE, check = TRUE,
as_labelled_factor = TRUE)
#> # A tibble: 10 × 3
#> p_alive.1 p_status n
#> <chr> <fct> <int>
#> 1 Alive Patient alive after FC (with or without following SPC after … 5986
#> 2 Alive Patient alive after SPC 11421
#> 3 Alive NA - Patient not born before end of FU 4
#> 4 Alive NA - Patient did not develop cancer before end of FU 873
#> 5 Dead Patient alive after FC (with or without following SPC after … 909
#> 6 Dead Patient alive after SPC 1294
#> 7 Dead Patient dead after FC 6116
#> 8 Dead Patient dead after SPC 5286
#> 9 Dead NA - Patient did not develop cancer before end of FU 44
#> 10 Dead NA - Patient date of death is missing 64
#> # A tibble: 7 × 2
#> p_status n
#> <fct> <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU) 6895
#> 2 Patient alive after SPC 12715
#> 3 Patient dead after FC 6116
#> 4 Patient dead after SPC 5286
#> 5 NA - Patient not born before end of FU 4
#> 6 NA - Patient did not develop cancer before end of FU 917
#> 7 NA - Patient date of death is missing 64
usdata_wide %>%
dplyr::select(fake_id, p_status, p_alive.1, datedeath.1, t_site_icd.1, t_datediag.1,
t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 × 8
#> fake_id p_status p_alive.1 datedeath.1 t_site_icd.1 t_datediag.1 t_site_icd.2
#> <chr> <fct> <chr> <date> <chr> <date> <chr>
#> 1 100004 Patient… Alive NA C50 1992-07-15 C54
#> 2 100039 Patient… Alive NA C50 2003-08-15 C34
#> 3 100073 Patient… Dead 2012-06-01 C44 1993-11-15 C34
#> 4 100143 Patient… Alive NA C50 1992-03-15 C34
#> 5 100182 Patient… Alive NA C18 1991-09-15 C34
#> 6 100197 Patient… Alive NA C34 2012-06-15 C50
#> 7 100208 NA - Pa… Dead 2019-11-15 C34 2019-11-15 <NA>
#> 8 100230 Patient… Alive NA C44 1992-11-15 C34
#> 9 100234 Patient… Alive NA C34 2010-02-15 <NA>
#> 10 100266 Patient… Dead 2010-07-15 C34 2010-07-15 <NA>
#> # ℹ 31,987 more rows
#> # ℹ 1 more variable: t_datediag.2 <date>
#alternatively, you can impute the date of death using lifedatmin_var
usdata_wide %>%
pat_status(., fu_end = "2017-12-31", dattype = "seer",
status_var = "p_status", life_var = "p_alive.1",
spc_var = "p_spc", birthdat_var = "datebirth.1",
lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
use_lifedatmin = TRUE, lifedatmin_var = "p_dodmin.1",
check = TRUE, as_labelled_factor = TRUE)
#> # A tibble: 9 × 3
#> p_alive.1 p_status n
#> <chr> <fct> <int>
#> 1 Alive Patient alive after FC (with or without following SPC after e… 5986
#> 2 Alive Patient alive after SPC 11421
#> 3 Alive NA - Patient not born before end of FU 4
#> 4 Alive NA - Patient did not develop cancer before end of FU 873
#> 5 Dead Patient alive after FC (with or without following SPC after e… 913
#> 6 Dead Patient alive after SPC 1295
#> 7 Dead Patient dead after FC 6138
#> 8 Dead Patient dead after SPC 5323
#> 9 Dead NA - Patient did not develop cancer before end of FU 44
#> # A tibble: 6 × 2
#> p_status n
#> <fct> <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU) 6899
#> 2 Patient alive after SPC 12716
#> 3 Patient dead after FC 6138
#> 4 Patient dead after SPC 5323
#> 5 NA - Patient not born before end of FU 4
#> 6 NA - Patient did not develop cancer before end of FU 917
#> # A tibble: 31,997 × 139
#> fake_id SEQ_NUM.1 registry.1 sex.1 race.1 datebirth.1 t_datediag.1
#> <chr> <int> <chr> <chr> <chr> <date> <date>
#> 1 100004 1 SEER Reg 20 - Detroi… Male White 1926-01-01 1992-07-15
#> 2 100039 1 SEER Reg 02 - Connec… Fema… White 1946-01-01 2003-08-15
#> 3 100073 1 SEER Reg 01 - San Fr… Male White 1960-01-01 1993-11-15
#> 4 100143 1 SEER Reg 02 - Connec… Male White 1944-01-01 1992-03-15
#> 5 100182 1 SEER Reg 02 - Connec… Male Other 1927-01-01 1991-09-15
#> 6 100197 1 SEER Reg 02 - Connec… Fema… White 1945-01-01 2012-06-15
#> 7 100208 1 SEER Reg 02 - Connec… Male White 1970-01-01 2019-11-15
#> 8 100230 1 SEER Reg 01 - San Fr… Male White 1947-01-01 1992-11-15
#> 9 100234 1 SEER Reg 01 - San Fr… Male White 1988-01-01 2010-02-15
#> 10 100266 1 SEER Reg 01 - San Fr… Fema… White 1956-01-01 2010-07-15
#> # ℹ 31,987 more rows
#> # ℹ 132 more variables: t_site_icd.1 <chr>, t_dco.1 <chr>, t_hist.1 <int>,
#> # fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>, p_dodmin.1 <date>,
#> # fc_agegroup.1 <chr>, t_yeardiag.1 <chr>, SEQ_NUM.2 <int>, registry.2 <chr>,
#> # sex.2 <chr>, race.2 <chr>, datebirth.2 <date>, t_datediag.2 <date>,
#> # t_site_icd.2 <chr>, t_dco.2 <chr>, t_hist.2 <int>, fc_age.2 <int>,
#> # datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>, …
usdata_wide <- usdata_wide %>%
dplyr::filter(!p_status %in% c("NA - Patient not born before end of FU",
"NA - Patient did not develop cancer before end of FU",
"NA - Patient date of death is missing"))
usdata_wide %>%
dplyr::count(p_status)
#> # A tibble: 4 × 2
#> p_status n
#> <fct> <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU) 6895
#> 2 Patient alive after SPC 12715
#> 3 Patient dead after FC 6116
#> 4 Patient dead after SPC 5286
usdata_wide <- usdata_wide %>%
calc_futime(., futime_var_new = "p_futimeyrs", fu_end = "2017-12-31",
dattype = "seer", time_unit = "years",
lifedat_var = "datedeath.1",
fcdat_var = "t_datediag.1", spcdat_var = "t_datediag.2")
#> # A tibble: 4 × 5
#> p_status mean_futime min_futime max_futime median_futime
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 Patient alive after FC (with … 9.56 0.0438 27.0 8.29
#> 2 Patient alive after SPC 8.70 0 26.9 7.50
#> 3 Patient dead after FC 8.60 0 25.9 7.54
#> 4 Patient dead after SPC 6.29 0 25.3 5.17
usdata_wide %>%
dplyr::select(fake_id, p_status, p_futimeyrs, p_alive.1, datedeath.1, t_datediag.1, t_datediag.2)
#> # A tibble: 31,012 × 7
#> fake_id p_status p_futimeyrs p_alive.1 datedeath.1 t_datediag.1 t_datediag.2
#> <chr> <fct> <dbl> <chr> <date> <date> <date>
#> 1 100004 Patient … 11.5 Alive NA 1992-07-15 2004-01-15
#> 2 100039 Patient … 7.67 Alive NA 2003-08-15 2011-04-15
#> 3 100073 Patient … 10.1 Dead 2012-06-01 1993-11-15 2003-12-15
#> 4 100143 Patient … 3.33 Alive NA 1992-03-15 1995-07-15
#> 5 100182 Patient … 7.08 Alive NA 1991-09-15 1998-10-15
#> 6 100197 Patient … 4.83 Alive NA 2012-06-15 2017-04-15
#> 7 100230 Patient … 11.0 Alive NA 1992-11-15 2003-11-15
#> 8 100234 Patient … 7.87 Alive NA 2010-02-15 NA
#> 9 100266 Patient … 0 Dead 2010-07-15 2010-07-15 NA
#> 10 100274 Patient … 7.38 Dead 2011-06-01 2004-01-15 NA
#> # ℹ 31,002 more rows
sircalc_results <- usdata_wide %>%
sir_byfutime(
dattype = "seer",
ybreak_vars = c("race.1", "t_dco.1"),
xbreak_var = "none",
futime_breaks = c(0, 1/12, 2/12, 1, 5, 10, Inf),
count_var = "count_spc",
refrates_df = us_refrates_icd2,
calc_total_row = TRUE,
calc_total_fu = TRUE,
region_var = "registry.1",
age_var = "fc_agegroup.1",
sex_var = "sex.1",
year_var = "t_yeardiag.1",
race_var = "race.1",
site_var = "t_site_icd.1", #using grouping by second cancer incidence
futime_var = "p_futimeyrs",
alpha = 0.05)
#>
Calculating SIR ■■■■■■ 18% | ETA: 5s
Calculating SIR ■■■■■■■■ 23% | ETA: 5s
Calculating SIR ■■■■■■■■■ 27% | ETA: 4s
Calculating SIR ■■■■■■■■■■■ 32% | ETA: 4s
Calculating SIR ■■■■■■■■■■■■ 36% | ETA: 4s
Calculating SIR ■■■■■■■■■■■■■ 41% | ETA: 4s
Calculating SIR ■■■■■■■■■■■■■■■ 45% | ETA: 3s
Calculating SIR ■■■■■■■■■■■■■■■■ 50% | ETA: 3s
Calculating SIR ■■■■■■■■■■■■■■■■■ 55% | ETA: 3s
Calculating SIR ■■■■■■■■■■■■■■■■■■■ 59% | ETA: 2s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■ 64% | ETA: 2s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■ 68% | ETA: 2s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■ 73% | ETA: 2s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■ 77% | ETA: 1s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■ 82% | ETA: 1s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 86% | ETA: 1s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 91% | ETA: 1s
Calculating SIR ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 95% | ETA: 0s
[INFO Cases 0 PYARs] There are conflicts where strata with 0 follow-up time have data in observed.
#> ℹ 30 strata are affected.
#> - This might be caused by cases where SPC occured at the same day as first cancer.
#> - You can check this by excluding all cases from wide_df, where date of first diagnosis is equal.
#> ! Check attribute `problems_not_empty` of results to see what strata are affected.
#> [INFO Unexpected Cases] There are observed cases in the results file that do not occur in the refrates_df.
#> ℹ 2665 strata are affected.
#> A possible explanation can be:
#> - DCO cases or
#> - diagnosis of second cancer occured in different time period than first cancer
#> ! Check attribute `notes_refcases` of results to see what strata are affected.
#>
sircalc_results %>% print(n = 100)
#> # A tidytable: 421,430 × 22
#> age region sex race year yvar_name yvar_label fu_time t_site observed
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 00 - … SEER … Fema… Black 1990… total_var Overall to 1 m… C14 0
#> 2 00 - … SEER … Fema… Black 1990… total_var Overall to 1 m… C18 0
#> 3 00 - … SEER … Fema… Black 1990… total_var Overall to 1 m… C34 0
#> 4 00 - … SEER … Fema… Black 1990… total_var Overall to 1 m… C44 0
#> 5 00 - … SEER … Fema… Black 1990… total_var Overall to 1 m… C50 0
#> 6 00 - … SEER … Fema… Black 1990… total_var Overall to 1 m… C54 0
#> 7 00 - … SEER … Fema… Black 1990… total_var Overall to 1 m… C64 0
#> 8 00 - … SEER … Fema… Black 1990… total_var Overall to 1 m… C80 0
#> 9 00 - … SEER … Fema… Black 1990… total_var Overall 0.0833… C14 0
#> 10 00 - … SEER … Fema… Black 1990… total_var Overall 0.0833… C18 0
#> 11 00 - … SEER … Fema… Black 1990… total_var Overall 0.0833… C34 0
#> 12 00 - … SEER … Fema… Black 1990… total_var Overall 0.0833… C44 0
#> 13 00 - … SEER … Fema… Black 1990… total_var Overall 0.0833… C50 0
#> 14 00 - … SEER … Fema… Black 1990… total_var Overall 0.0833… C54 0
#> 15 00 - … SEER … Fema… Black 1990… total_var Overall 0.0833… C64 0
#> 16 00 - … SEER … Fema… Black 1990… total_var Overall 0.0833… C80 0
#> 17 00 - … SEER … Fema… Black 1990… total_var Overall 0.167-… C14 0
#> 18 00 - … SEER … Fema… Black 1990… total_var Overall 0.167-… C18 0
#> 19 00 - … SEER … Fema… Black 1990… total_var Overall 0.167-… C34 0
#> 20 00 - … SEER … Fema… Black 1990… total_var Overall 0.167-… C44 0
#> 21 00 - … SEER … Fema… Black 1990… total_var Overall 0.167-… C50 0
#> 22 00 - … SEER … Fema… Black 1990… total_var Overall 0.167-… C54 0
#> 23 00 - … SEER … Fema… Black 1990… total_var Overall 0.167-… C64 0
#> 24 00 - … SEER … Fema… Black 1990… total_var Overall 0.167-… C80 0
#> 25 00 - … SEER … Fema… Black 1990… total_var Overall 1-5 ye… C14 0
#> 26 00 - … SEER … Fema… Black 1990… total_var Overall 1-5 ye… C18 0
#> 27 00 - … SEER … Fema… Black 1990… total_var Overall 1-5 ye… C34 0
#> 28 00 - … SEER … Fema… Black 1990… total_var Overall 1-5 ye… C44 0
#> 29 00 - … SEER … Fema… Black 1990… total_var Overall 1-5 ye… C50 0
#> 30 00 - … SEER … Fema… Black 1990… total_var Overall 1-5 ye… C54 0
#> 31 00 - … SEER … Fema… Black 1990… total_var Overall 1-5 ye… C64 0
#> 32 00 - … SEER … Fema… Black 1990… total_var Overall 1-5 ye… C80 0
#> 33 00 - … SEER … Fema… Black 1990… total_var Overall 5-10 y… C14 0
#> 34 00 - … SEER … Fema… Black 1990… total_var Overall 5-10 y… C18 0
#> 35 00 - … SEER … Fema… Black 1990… total_var Overall 5-10 y… C34 0
#> 36 00 - … SEER … Fema… Black 1990… total_var Overall 5-10 y… C44 0
#> 37 00 - … SEER … Fema… Black 1990… total_var Overall 5-10 y… C50 0
#> 38 00 - … SEER … Fema… Black 1990… total_var Overall 5-10 y… C54 0
#> 39 00 - … SEER … Fema… Black 1990… total_var Overall 5-10 y… C64 0
#> 40 00 - … SEER … Fema… Black 1990… total_var Overall 5-10 y… C80 0
#> 41 00 - … SEER … Fema… Black 1990… total_var Overall 10+ ye… C14 0
#> 42 00 - … SEER … Fema… Black 1990… total_var Overall 10+ ye… C18 0
#> 43 00 - … SEER … Fema… Black 1990… total_var Overall 10+ ye… C34 1
#> 44 00 - … SEER … Fema… Black 1990… total_var Overall 10+ ye… C44 0
#> 45 00 - … SEER … Fema… Black 1990… total_var Overall 10+ ye… C50 0
#> 46 00 - … SEER … Fema… Black 1990… total_var Overall 10+ ye… C54 0
#> 47 00 - … SEER … Fema… Black 1990… total_var Overall 10+ ye… C64 0
#> 48 00 - … SEER … Fema… Black 1990… total_var Overall 10+ ye… C80 0
#> 49 00 - … SEER … Fema… Black 1990… total_var Overall Total … C14 0
#> 50 00 - … SEER … Fema… Black 1990… total_var Overall Total … C18 0
#> 51 00 - … SEER … Fema… Black 1990… total_var Overall Total … C34 1
#> 52 00 - … SEER … Fema… Black 1990… total_var Overall Total … C44 0
#> 53 00 - … SEER … Fema… Black 1990… total_var Overall Total … C50 0
#> 54 00 - … SEER … Fema… Black 1990… total_var Overall Total … C54 0
#> 55 00 - … SEER … Fema… Black 1990… total_var Overall Total … C64 0
#> 56 00 - … SEER … Fema… Black 1990… total_var Overall Total … C80 0
#> 57 00 - … SEER … Fema… Black 1990… race.1 Black to 1 m… C14 0
#> 58 00 - … SEER … Fema… Black 1990… race.1 Black to 1 m… C18 0
#> 59 00 - … SEER … Fema… Black 1990… race.1 Black to 1 m… C34 0
#> 60 00 - … SEER … Fema… Black 1990… race.1 Black to 1 m… C44 0
#> 61 00 - … SEER … Fema… Black 1990… race.1 Black to 1 m… C50 0
#> 62 00 - … SEER … Fema… Black 1990… race.1 Black to 1 m… C54 0
#> 63 00 - … SEER … Fema… Black 1990… race.1 Black to 1 m… C64 0
#> 64 00 - … SEER … Fema… Black 1990… race.1 Black to 1 m… C80 0
#> 65 00 - … SEER … Fema… Black 1990… race.1 Black 0.0833… C14 0
#> 66 00 - … SEER … Fema… Black 1990… race.1 Black 0.0833… C18 0
#> 67 00 - … SEER … Fema… Black 1990… race.1 Black 0.0833… C34 0
#> 68 00 - … SEER … Fema… Black 1990… race.1 Black 0.0833… C44 0
#> 69 00 - … SEER … Fema… Black 1990… race.1 Black 0.0833… C50 0
#> 70 00 - … SEER … Fema… Black 1990… race.1 Black 0.0833… C54 0
#> 71 00 - … SEER … Fema… Black 1990… race.1 Black 0.0833… C64 0
#> 72 00 - … SEER … Fema… Black 1990… race.1 Black 0.0833… C80 0
#> 73 00 - … SEER … Fema… Black 1990… race.1 Black 0.167-… C14 0
#> 74 00 - … SEER … Fema… Black 1990… race.1 Black 0.167-… C18 0
#> 75 00 - … SEER … Fema… Black 1990… race.1 Black 0.167-… C34 0
#> 76 00 - … SEER … Fema… Black 1990… race.1 Black 0.167-… C44 0
#> 77 00 - … SEER … Fema… Black 1990… race.1 Black 0.167-… C50 0
#> 78 00 - … SEER … Fema… Black 1990… race.1 Black 0.167-… C54 0
#> 79 00 - … SEER … Fema… Black 1990… race.1 Black 0.167-… C64 0
#> 80 00 - … SEER … Fema… Black 1990… race.1 Black 0.167-… C80 0
#> 81 00 - … SEER … Fema… Black 1990… race.1 Black 1-5 ye… C14 0
#> 82 00 - … SEER … Fema… Black 1990… race.1 Black 1-5 ye… C18 0
#> 83 00 - … SEER … Fema… Black 1990… race.1 Black 1-5 ye… C34 0
#> 84 00 - … SEER … Fema… Black 1990… race.1 Black 1-5 ye… C44 0
#> 85 00 - … SEER … Fema… Black 1990… race.1 Black 1-5 ye… C50 0
#> 86 00 - … SEER … Fema… Black 1990… race.1 Black 1-5 ye… C54 0
#> 87 00 - … SEER … Fema… Black 1990… race.1 Black 1-5 ye… C64 0
#> 88 00 - … SEER … Fema… Black 1990… race.1 Black 1-5 ye… C80 0
#> 89 00 - … SEER … Fema… Black 1990… race.1 Black 5-10 y… C14 0
#> 90 00 - … SEER … Fema… Black 1990… race.1 Black 5-10 y… C18 0
#> 91 00 - … SEER … Fema… Black 1990… race.1 Black 5-10 y… C34 0
#> 92 00 - … SEER … Fema… Black 1990… race.1 Black 5-10 y… C44 0
#> 93 00 - … SEER … Fema… Black 1990… race.1 Black 5-10 y… C50 0
#> 94 00 - … SEER … Fema… Black 1990… race.1 Black 5-10 y… C54 0
#> 95 00 - … SEER … Fema… Black 1990… race.1 Black 5-10 y… C64 0
#> 96 00 - … SEER … Fema… Black 1990… race.1 Black 5-10 y… C80 0
#> 97 00 - … SEER … Fema… Black 1990… race.1 Black 10+ ye… C14 0
#> 98 00 - … SEER … Fema… Black 1990… race.1 Black 10+ ye… C18 0
#> 99 00 - … SEER … Fema… Black 1990… race.1 Black 10+ ye… C34 1
#> 100 00 - … SEER … Fema… Black 1990… race.1 Black 10+ ye… C44 0
#> # ℹ 421,330 more rows
#> # ℹ 12 more variables: expected <dbl>, sir <dbl>, sir_lci <dbl>, sir_uci <dbl>,
#> # pyar <dbl>, n_base <dbl>, ref_inc_cases <dbl>, ref_population_pyar <dbl>,
#> # ref_inc_crude_rate <dbl>, fu_time_sort <int>, yvar_sort <int>,
#> # warning <chr>
#The summarize function is versatile. Here for example the summary with minimal output
sircalc_results %>%
#summarize results across region, age, year and t_site
summarize_sir_results(.,
summarize_groups = c("region", "age", "year", "race"),
summarize_site = TRUE,
output = "long", output_information = "minimal",
add_total_row = "only", add_total_fu = "no",
collapse_ci = FALSE, shorten_total_cols = TRUE,
fubreak_var_name = "fu_time", ybreak_var_name = "yvar_name",
xbreak_var_name = "none", site_var_name = "t_site",
alpha = 0.05
) %>%
dplyr::select(-region, -age, -year, -race, -sex, -yvar_name)
#> Warning: The results file `sir_df` contains observed cases in i_observed that do not occur in the refrates_df (ref_inc_cases).
#> Therefore calculation of the variables n_base and ref_population_pyar is ambiguous.
#> We take the first value of each variable. Expect small inconsistencies in the calculation of n_base, ref_population_pyar and ref_inc_crude_rate across strata.
#> ! If you want to know more, please check the `warnings` column of `sir_df`.
#> # A tidytable: 7 × 8
#> yvar_label fu_time fu_time_sort t_site observed expected sir sir_ci
#> <chr> <chr> <int> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 Overall to 1 month 1 Total 306 20.6 14.9 13.25…
#> 2 Overall 0.0833-0.167 ye… 2 Total 74 20.4 3.62 2.84 …
#> 3 Overall 0.167-1 years 3 Total 717 196. 3.65 3.39 …
#> 4 Overall 1-5 years 4 Total 2995 760. 3.94 3.8 -…
#> 5 Overall 5-10 years 5 Total 3113 605. 5.14 4.96 …
#> 6 Overall 10+ years 6 Total 4254 502. 8.47 8.22 …
#> 7 Overall Total 0 to Inf … 7 Total 11459 2105. 5.44 5.34 …
sessionInfo()
#> R version 4.3.2 (2023-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 11 x64 (build 22631)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: Europe/Berlin
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] msSPChelpR_0.9.1 magrittr_2.0.3 dplyr_1.1.4
#>
#> loaded via a namespace (and not attached):
#> [1] jsonlite_1.8.8 compiler_4.3.2 tidyselect_1.2.0 stringr_1.5.1
#> [5] tidytable_0.10.2 tidyr_1.3.0 jquerylib_0.1.4 yaml_2.3.8
#> [9] fastmap_1.1.1 R6_2.5.1 generics_0.1.3 sjlabelled_1.2.0
#> [13] knitr_1.45 forcats_1.0.0 tibble_3.2.1 insight_0.19.7
#> [17] lubridate_1.9.3 bslib_0.6.1 pillar_1.9.0 rlang_1.1.3
#> [21] utf8_1.2.4 stringi_1.8.3 cachem_1.0.8 xfun_0.41
#> [25] sass_0.4.8 timechange_0.2.0 cli_3.6.2 withr_3.0.0
#> [29] digest_0.6.34 rstudioapi_0.15.0 haven_2.5.4 hms_1.1.3
#> [33] lifecycle_1.0.4 vctrs_0.6.5 data.table_1.14.10 evaluate_0.23
#> [37] glue_1.7.0 fansi_1.0.6 rmarkdown_2.25 purrr_1.0.2
#> [41] tools_4.3.2 pkgconfig_2.0.3 htmltools_0.5.7