msSPChelpR 0.9.1
New Features
- new function
histgroup_iarc()
to create variable for groups of malignant neoplasms considered to be histologically ‘different’ for the purpose of defining multiple tumors, ICD-O-3 (see #100)
- some functions gain new
quiet
argument to suppress rlang::warn()
and rlang::inform()
messages. You can use this when you have checked your results for correctness and want to reduce message output, but keep the progress bars.
asir()
: add World Standard Population 2000-2025 for function with option std_pop=="WHO2000"
as described here: https://seer.cancer.gov/stdpopulations/world.who.html
sir_byfutime()
gains new argument expect_missing_refstrata_df
. You can define another dataframe that contains strata expected to be missing from refrates_df (because they are not explicitly coded with incidence = 0). This can be helpful, if refrates_df has a lot of strata and 0 incidence strata have been removed to save storage space. Internally, the rows of expect_missing_refstrata_df will be appended to refrates_df. This reduces the number of lines reported in attribute problems_missing_ref_strata
. Default setting is expect_missing_refstrata_df = NULL
.
- sample data set for
data("us_second_cancer")
gains new variable t_hist
on histology, i.e. ICD-O-3-Code on tumor morphology (4 digits)
Breaking Changes
- no breaking changes in this version
Bug fixes
- make
calc_refrates()
more robust for missing race_var
(Closes #89)
- fix bug in
calc_refrates()
using calc_totals == TRUE
(Closes #90)
- fix bug in
calc_refrates()
using numeric versions of fill_sites
(Closes #92)
- fix bug in
asir()
that throws error for variable not needed (Closes #95)
Internal
- replace progress bars by
cli
- deprecate
verb.()
syntax from tidytable (Closes #94)
msSPChelpR 0.9.0
New Features
- new function
calc_refrates()
to calculate age-, sex-, region-, year-specific reference rates from a long format dataframe with cancer cases that are counted for incident cases and then matched with a reference population. The resulting reference rates dataframe can directly be used with sir_byfutime()
function.
- functions gain new default
dattype = NULL
and thus are more flexible to take other source data types (Closes #73)
Breaking Changes
- functions
asir
, calc_futime*
, calc_refrates
, ir_crosstab_byfutime
, pat_status*
, renumber_time_id*
, and sir_byfutime
now by default are set to dattype = NULL
. If you relied on automatic variable naming feature, you need to add dattype = "seer"
or dattype = "zfkd"
to your function call.
- fix typo in attribute names: attributes are now correctly named
problems_missing_count_strata
and problems_missing_fu_strata
(Closes #80)
Bug fixes
sir_byfutime()
:
- attributes with notes and problems are now correctly saved to
results_df
Internal
- deprecated functions from
tidytable
package have been replaced (Closes #71 and #74)
msSPChelpR 0.8.7
New Features
- new function
sir_ratio()
and related sir_ratio_lci()
and sir_ratio_uci()
to calculate ratio of two SIRs/SMRs to get relative risk and confidence limits for this ratio.
- tidytable variant of reshape_long function, i.e.
reshape_long_tt()
⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant.
summarize_sir_results()
:
- add ability to summarize by different site_var than the one used in
sir_byfutime()
Bug fixes
summarize_sir_results()
:
- PYARs are now correctly calculated when using
summarize_site == TRUE
. Previously the results incorrectly counted each site multiple times. (Closes #62)
pat_status()
:
- update default values for
dattype = "zfkd"
Internal
- add R-CMD-Check to github actions
msSPChelpR 0.8.6
New Features
- new sample data set for standard populations ⇒
data("standard_population")
- new sample data set for us population ⇒
data("population_us")
(Closes #58)
Bug fixes
sir_byfutime()
: change output of integer columns to numeric to fix bug in summarize_sir_results()
(Closes #59)
Other changes
- add examples to function documentation (Closes #56)
- remove “R” from package title (Closes #57)
- update package description (Closes #54)
- update introduction vignette
vignette("introduction")
msSPChelpR 0.8.5 - 2020-09-28
New Features
- tidytable variants of functions, i.e.
reshape_wide_tt()
, renumber_time_id_tt()
, pat_status_tt()
, vital_status_tt()
, calc_futime_tt()
⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant.
sir_byfutime()
:
- is much faster using tidytable package
- gained the option
race_var
to optionally stratify SIR calculations by race.
summarize_sir_results()
:
- new function that increases functionality in summarizing results from
sir_byfutime()
function
- new option to define custom
site_var_name
- new package website https://marianschmidt.github.io/msSPChelpR
- new sample datasets included in the package to demonstrate examples (#36)
Breaking Changes
sir_byfutime()
:
- options
add_total_row
and add_total_fu
are replaced by calc_total_row
and calc_total_fu
. These are logical parameters now. The positioning of total rows and columns is completely handled by the summarize_sir_results()
function now. There total rows can be set to top and bottom and total columns to left and right.
- option
expcount_src
including related parameters stdpop_df
, refpop_df
, std_pop
, truncate_std_pop
and pyar_var
have been removed. Function sir_byfutime()
will only work calculating expected counts based on reference rates, not within the cohort of the dataset. To calculate expected based on the cohort, a new function create_refrates
will be added in the future. (#41)
- option
collapse_ci
has been removed and added to summarize_sir_results()
instead.
- option name for tumor site variable changed from
icdcat_var
to site_var
- option name for age/age group variable changed from
agegroup_var
to age_var
- in total the parameters
expcount_src
, futime_src
, stdpop_df
, refpop_df
, std_pop
, truncate_std_pop
, pyar_var
, icdcat_var
, collapse_ci
have been removed to simply the function ⇒ make sure you remove these arguments from your sir_byfutime()
function calls.
sir()
:
- is superseded by the use of
sir_byfutime()
. To migrate your former sir()
functions, you can simply use sir_byfutime(, futime_breaks = "none")
that will yield the same results.
summarize_sir_results()
:
- option name for tumor site variable changed from
summarize_icdcat
to summarize_site
reshape_long_tidyr()
:
- option
var_selection
is deprecated. Please select variables before running the reshape_long_*
functions.
asir()
:
- option name for age/age group variable changed from
agegroup_var
to age_var
- option name for tumor site variable changed from
icdcat_var
to site_var
pat_status()
, pat_status_tt()
, vital_status()
, and vital_status_tt()
:
- Capitalized default variable labelling.
- This might break code that relied on using the labels coming out of these functions in later filter or mutate functions.
ir_crosstab_byfutime()
:
- option
futime_breaks
now uses breaks in years instead of months as previously.
- default
futime_var
is now follow-up time in years
- now requires dplyr version 1.0.0
- now requires tidytable package
- the default option name for tumor site variable changed from
icdcat_var
to site_var
. This need manual update of function calls of sir_byfutime()
and asir()
, if option is specified.
- the default variable name for tumor site in all functions has been changed from
t_icdcat
to t_site
. So the reference data frames used will need to have a t_site
column.
- the data.table variants of functions (
renumber_time_id_dt()
, pat_status_dt()
, reshape_long_dt()
, reshape_wide_dt()
, vital_status_dt()
) have been removed for simplicity, please use tidytable variants, i.e. reshape_wide_tt()
, renumber_time_id_tt()
, pat_status_tt()
, vital_status_tt()
, calc_futime_tt()
, instead. They will give the same data.table output and same performance.
Bug Fixes
- implement new reliable routine to split df when
reshape_wide()
with option chunks
is used. Closes #1.
- Sorting of columns in wide datasets by
reshape_wide_tidyr()
and reshape_wide_tt()
is now preserved. Closes #31.
- ensure sorting in
renumer_time_id()
and make sure that new_time_id_var
is returned as integer.
- fix bug in
pat_status_*(., check = TRUE)
option
- improve internal tests in
sir_byfutime()
so that PYARs do not get lost before running summary function
sir_byfutime()
now also gives correct results if range of futime_breaks
is not 0-Inf but smaller
msSPChelpR 0.8.4 - 2020-05-21
New Features
- add timevar_max option to
renumber_time_id()
function; use sorting by date of diagnosis instead of old time_id_var
- various improvements to
reshape_wide_tidyr()
function
- various improvements to
reshape_wide_dt()
function which is much faster now and uses data.table::dcast
instead of stats::reshape
now
- various improvements to
pat_status()
and pat_status_dt()
functions
- option summarize_icdcat in
summarize_sir_results()
is now functional
- update vignette
vignette("introduction")
Bug Fixes
- fix incomplete check for required variables in
pat_status()
and pat_status_dt()
functions
- fix error in check for required variables in
renumber_time_id()
that broke functions
- fix bug in check for end of FU time in
pat_status()
and calc_futime()
- implement new tidyselect routine using
tidyselect::all_of
in summarize_sir_results()
msSPChelpR 0.8.3
New Features
- new faster version of reshape_long based on data.table
- start new vignette on workflow from filtered long dataset to follow-up times
vignette("patstatus_futime")
Bug Fixes
- implement new tidyselect routine using
tidyselect::all_of
for vector-based variable selection
- implement correct referencing in
vital_status_dt
and pat_status_dt
- add exports from
data.table
- update documentation for sir and sir_byfutime functions
- make
reshape_long
function work
msSPChelpR 0.8.2
msSPChelpR 0.8.1
New Features
- new faster version of vital_status function using data.table
- new faster version of pat_status function using data.table
msSPChelpR 0.8.0
New Features
- new faster version of reshape_wide_dt function based on data.table and without problematic slices done by reshape_wide
- new faster version of renumber_time_id function based on data.table
msSPChelpR 0.7.4
New Features
- new function renumber_time_id
msSPChelpR 0.7.3
Bug Fixes
- add check to revert status_var to numeric in case it was created with option as_labelled_factor
- fix label bug in life_var_new
msSPChelpR 0.7.2
- add option as_labelled_factor to vital_status function
- fix newly introduced error in vital_status function
msSPChelpR 0.7.1
- fix error in vital_status function by replacing sjlabelled::get_label function
msSPChelpR 0.7.0
- fix error in pat_status and vital_status functions due to change in sjlabelled package
msSPChelpR 0.6.10
- rebuild description file and manual
msSPChelpR 0.6.9
- remove nest_legacy functions and use new tidyr syntax, close #19
msSPChelpR 0.6.8
- make summarize_sir_results function work without break variables
msSPChelpR 0.6.7
- for function sir_byfutime ⇒ make option
add_total_row
work, even if option ybreak_vars = "none"
msSPChelpR 0.6.6
- Make use of time_id_var and case_id_var use coherent across reshape functions
msSPChelpR 0.6.5
msSPChelpR 0.6.4
- Added a
NEWS.md
file to track changes to the package.
msSPChelpR 0.6.3
- add option
futime_breaks = "none"
to sir_byfutime
function
major changes in msSPChelpR 0.6.0
- includes a new function to calculate crude (absolute) incidence rates a tabulate them by whatever number of grouping variables and it can be used as a Table 1 for publications ⇒ The function is called msSPChelpR::ir_crosstab
- includes a new function to calculate SIRs (standardized incidence ratios) by whatever strata you desire (unlimited ybreak_vars; one xbreak_var) and additionally customized breaks for follow-up times (default is: to 6 months, .5-1 year, 1-5 years, 5-10 years, >10 years) ⇒ attention, it only makes sense to stratify results (ybreak_vars or xbreak_var) by variables measured at baseline and not for variables that are dependent on the occurrence of an SPC) ⇒ function msSPChelpR::sir_byfutime ⇒ depending on the number of stratification variables you are using, this function may result in a very long results data.frame. So please use it together with the new function msSPChelpR::summarize_sir_results
- includes a new function to summarize results dataframes from SIR calculations
- New reshape functions that are faster and are using less memory