Step 4: Obtain aggregated data on temporal symmetry

Introduction

In this vignette we will explore the functionality and arguments of summariseTemporalSymmetry() function. This function uses cdm$intersect introduced in the previous vignette Step 1. Generate a sequence cohort to produce aggregated statistics containing the frequency for different time gaps between the initiation of the marker and the initiation of the index (marker_date \(-\) index_date). The work of this function is best illustrated via an example.

Recall that in the previous vignette, we’ve used cdm$aspirin and cdm$acetaminophen to generate cdm$intersect like so:

# Generate a sequence cohort
cdm <- generateSequenceCohortSet(
  cdm = cdm,
  indexTable = "aspirin",
  markerTable = "acetaminophen",
  name = "intersect",
  combinationWindow = c(0,Inf))

Obtaining temporal symmetry

summariseTemporalSymmetry(cohort = cdm$intersect) |> 
  dplyr::glimpse()
#> Rows: 558
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_name &&& marker_name", "index_name &&& marker_…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "temporal_symmetry", "temporal_symmetry", "temporal_s…
#> $ variable_level   <chr> "-29", "40", "10", "20", "69", "-35", "89", "36", "10…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "6", "7", "6", "10", "5", "5", "5", "5", NA, "9", "9"…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

The default unit of the difference of two initiations is measured in months. In this example, the first row is showing there are \(6\) cases of index happening after marker with the gap being \(29\) months whereas the second row is showing there are \(7\) cases of index happening before marker with the gap being \(40\) months.

Modify the cohort based on cohort_definition_id

This parameter is used to subset the cohort table inputted to the summariseTemporalSymmetry(). Imagine the user only wants to include cohort_definition_id \(= 1\) from cdm$intersect in the summariseTemporalSymmetry(), then one could do the following:

summariseTemporalSymmetry(cohort = cdm$intersect,
                          cohortId = 1) |> 
  dplyr::glimpse()
#> Rows: 558
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_name &&& marker_name", "index_name &&& marker_…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "temporal_symmetry", "temporal_symmetry", "temporal_s…
#> $ variable_level   <chr> "26", "277", "261", "375", "162", "178", "268", "17",…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "9", NA, NA, NA, NA, NA, NA, "17", "5", "13", NA, "6"…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

Of course and once again this does nothing because every entry in cdm$intersect has cohort_definition_id \(= 1\).

Modify timescale

Recall the default for the timescale is month, one could also change this to either day or year.

summariseTemporalSymmetry(cohort = cdm$intersect,
                          timescale = "day") |> 
  dplyr::glimpse()
#> Rows: 1,350
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_name &&& marker_name", "index_name &&& marker_…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "temporal_symmetry", "temporal_symmetry", "temporal_s…
#> $ variable_level   <chr> "11444", "242", "17011", "-1586", "-1289", "3455", "5…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> NA, "5", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
summariseTemporalSymmetry(cohort = cdm$intersect,
                          timescale = "year") |> 
  dplyr::glimpse()
#> Rows: 94
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_name &&& marker_name", "index_name &&& marker_…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "temporal_symmetry", "temporal_symmetry", "temporal_s…
#> $ variable_level   <chr> "17", "24", "26", "-8", "-27", "63", "2", "-5", "14",…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "12", "10", "7", "12", NA, NA, "130", "48", "31", "13…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

Modify minCellCount

By default, the minimum number of events to reported is 5, below which results will be obscured. If 0, all results will be reported and the user could do this via:

summariseTemporalSymmetry(cohort = cdm$intersect,
                          minCellCount = 0) |> 
  dplyr::glimpse()
#> Rows: 558
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "Synthea synthetic health database", "Synthea synthet…
#> $ group_name       <chr> "index_name &&& marker_name", "index_name &&& marker_…
#> $ group_level      <chr> "1191_aspirin &&& 161_acetaminophen", "1191_aspirin &…
#> $ strata_name      <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "temporal_symmetry", "temporal_symmetry", "temporal_s…
#> $ variable_level   <chr> "26", "277", "261", "375", "162", "178", "268", "17",…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "9", "2", "1", "1", "2", "3", "3", "17", "5", "13", "…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
CDMConnector::cdmDisconnect(cdm = cdm)