Creating an Findings SDTM domain

Introduction

This article describes how to create a Findings SDTM domain using the {sdtm.oak} package. Examples are currently presented and tested in the context of the VS domain.

Before reading this article, it is recommended that users review the “Creating an Events Domain” article, which provides a detailed explanation of various concepts in {sdtm.oak}, such as oak_id_vars, condition_add, etc. It also offers guidance on which mapping algorithms or functions to use for different mappings and provides a more detailed explanation of how these mapping algorithms or functions work.

In this article, we will dive directly into programming and provide further explanation only where it is required.

Programming workflow

In {sdtm.oak} we process one raw dataset at a time. Similar raw datasets (example Vital Signs - Screening (OID - vs_raw), Vital Signs - Treatment (OID - vs_t_raw)) can be stacked together before processing.

Repeat the above steps for different raw datasets before proceeding with the below steps.

Read in data

Read all the raw datasets into the environment. In this example, the raw dataset name is vs_raw. Users can read it from the package using the below code:

vs_raw <- read.csv(system.file("raw_data/vitals_raw_data.csv",
  package = "sdtm.oak"
))
PATNUM FORML ASMNTDN TMPTC VTLD VTLTM SUBPOS SYS_BP DIA_BP PULSE RESPRT TEMP TEMPLOC OXY_SAT LAT LOC
375 Vital Signs 0 Pre-dose 16-May-15 7:25 PRONE 158 92 63 17 40.48 SKIN 98 RIGHT FINGER
375 Vital Signs 0 Post-dose 16-May-15 10:25 SEMI-RECUMBENT 94 78 76 20 36.75 TYMPANIC MEMBRANE 99 LEFT FINGER
375 Vital Signs 0 6-May-18 2:01 PRONE 117 62 66 15 29.45 ORAL CAVITY 96 LEFT FINGER
376 Vital Signs 1 NA NA NA NA NA NA
376 Vital Signs 0 Pre-dose 23-Oct-08 1:19 PRONE 85 68 73 21 38.25 AXILLA 93 RIGHT FINGER
376 Vital Signs 0 Post-dose 23-Oct-08 3:19 PRONE 126 81 56 18 38.08 TYMPANIC MEMBRANE 93 LEFT FINGER

Create oak_id_vars

vs_raw <- vs_raw %>%
  generate_oak_id_vars(
    pat_var = "PATNUM",
    raw_src = "vitals"
  )
oak_id raw_source patient_number PATNUM FORML SYS_BP DIA_BP
1 vitals 375 375 Vital Signs 158 92
2 vitals 375 375 Vital Signs 94 78
3 vitals 375 375 Vital Signs 117 62
4 vitals 376 376 Vital Signs NA NA
5 vitals 376 376 Vital Signs 85 68
6 vitals 376 376 Vital Signs 126 81

Read in the DM domain

Read in CT

Controlled Terminology is part of the SDTM specification and it is prepared by the user. In this example, the study controlled terminology name is sdtm_ct.csv. Users can read it from the package using the below code:

study_ct <- read.csv(system.file("raw_data/sdtm_ct.csv",
  package = "sdtm.oak"
))
codelist_code term_code term_value collected_value term_preferred_term term_synonyms
C66726 C25158 CAPSULE Capsule Capsule Dosage Form cap
C66726 C25394 PILL Pill Pill Dosage Form
C66726 C29167 LOTION Lotion Lotion Dosage Form
C66726 C42887 AEROSOL Aerosol Aerosol Dosage Form aer
C66726 C42944 INHALANT Inhalant Inhalant Dosage Form
C66726 C42946 INJECTION Injection Injectable Dosage Form
C66726 C42953 LIQUID Liquid Liquid Dosage Form
C66726 C42998 TABLET Tablet Tablet Dosage Form tab
C66728 C25629 BEFORE Prior Prior
C66728 C53279 ONGOING Continue Continue Continuous

Map Topic Variable

This raw dataset has multiple topic variables. Lets start with the first topic variable. Map topic variable SYSBP from the raw variable SYS_BP.

# Map topic variable SYSBP and its qualifiers.
vs_sysbp <-
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "SYS_BP",
    tgt_var = "VSTESTCD",
    tgt_val = "SYSBP",
    ct_spec = study_ct,
    ct_clst = "C66741"
  ) %>%
  # Filter for records where VSTESTCD is not empty.
  # Only these records need qualifier mappings.
  dplyr::filter(!is.na(.data$VSTESTCD))
oak_id raw_source patient_number VSTESTCD
1 vitals 375 SYSBP
2 vitals 375 SYSBP
3 vitals 375 SYSBP
5 vitals 376 SYSBP
6 vitals 376 SYSBP

Map Rest of the Variables

Map rest of the variables applicable to the topic variable SYSBP. This can include qualifiers, identifier and timing variables.

# Map topic variable SYSBP and its qualifiers.
vs_sysbp <- vs_sysbp %>%
  # Map VSTEST using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "SYS_BP",
    tgt_var = "VSTEST",
    tgt_val = "Systolic Blood Pressure",
    ct_spec = study_ct,
    ct_clst = "C67153",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRES using assign_no_ct algorithm
  assign_no_ct(
    raw_dat = vs_raw,
    raw_var = "SYS_BP",
    tgt_var = "VSORRES",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRESU using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "SYS_BP",
    tgt_var = "VSORRESU",
    tgt_val = "mmHg",
    ct_spec = study_ct,
    ct_clst = "C66770",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSPOS using assign_ct algorithm
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "SUBPOS",
    tgt_var = "VSPOS",
    ct_spec = study_ct,
    ct_clst = "C71148",
    id_vars = oak_id_vars()
  )
oak_id raw_source patient_number VSTESTCD VSTEST VSORRES VSORRESU VSPOS
1 vitals 375 SYSBP Systolic Blood Pressure 158 mmHg PRONE
2 vitals 375 SYSBP Systolic Blood Pressure 94 mmHg SEMI-RECUMBENT
3 vitals 375 SYSBP Systolic Blood Pressure 117 mmHg PRONE
5 vitals 376 SYSBP Systolic Blood Pressure 85 mmHg PRONE
6 vitals 376 SYSBP Systolic Blood Pressure 126 mmHg PRONE

Repeat Map Topic and Map Rest

This raw data source has other topic variables DIABP, PULSE, RESP, TEMP, OXYSAT, VSALL and its corresponding qualifiers. Repeat mapping topic and qualifiers for each topic variable.

# Map topic variable DIABP and its qualifiers.
vs_diabp <-
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "DIA_BP",
    tgt_var = "VSTESTCD",
    tgt_val = "DIABP",
    ct_spec = study_ct,
    ct_clst = "C66741"
  ) %>%
  dplyr::filter(!is.na(.data$VSTESTCD)) %>%
  # Map VSTEST using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "DIA_BP",
    tgt_var = "VSTEST",
    tgt_val = "Diastolic Blood Pressure",
    ct_spec = study_ct,
    ct_clst = "C67153",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRES using assign_no_ct algorithm
  assign_no_ct(
    raw_dat = vs_raw,
    raw_var = "DIA_BP",
    tgt_var = "VSORRES",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRESU using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "DIA_BP",
    tgt_var = "VSORRESU",
    tgt_val = "mmHg",
    ct_spec = study_ct,
    ct_clst = "C66770",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSPOS using assign_ct algorithm
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "SUBPOS",
    tgt_var = "VSPOS",
    ct_spec = study_ct,
    ct_clst = "C71148",
    id_vars = oak_id_vars()
  )

# Map topic variable PULSE and its qualifiers.
vs_pulse <-
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "PULSE",
    tgt_var = "VSTESTCD",
    tgt_val = "PULSE",
    ct_spec = study_ct,
    ct_clst = "C66741"
  ) %>%
  dplyr::filter(!is.na(.data$VSTESTCD)) %>%
  # Map VSTEST using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "PULSE",
    tgt_var = "VSTEST",
    tgt_val = "Pulse Rate",
    ct_spec = study_ct,
    ct_clst = "C67153",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRES using assign_no_ct algorithm
  assign_no_ct(
    raw_dat = vs_raw,
    raw_var = "PULSE",
    tgt_var = "VSORRES",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRESU using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "PULSE",
    tgt_var = "VSORRESU",
    tgt_val = "beats/min",
    ct_spec = study_ct,
    ct_clst = "C66770",
    id_vars = oak_id_vars()
  )

# Map topic variable RESP from the raw variable RESPRT and its qualifiers.
vs_resp <-
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "RESPRT",
    tgt_var = "VSTESTCD",
    tgt_val = "RESP",
    ct_spec = study_ct,
    ct_clst = "C66741"
  ) %>%
  dplyr::filter(!is.na(.data$VSTESTCD)) %>%
  # Map VSTEST using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "RESPRT",
    tgt_var = "VSTEST",
    tgt_val = "Respiratory Rate",
    ct_spec = study_ct,
    ct_clst = "C67153",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRES using assign_no_ct algorithm
  assign_no_ct(
    raw_dat = vs_raw,
    raw_var = "RESPRT",
    tgt_var = "VSORRES",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRESU using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "RESPRT",
    tgt_var = "VSORRESU",
    tgt_val = "breaths/min",
    ct_spec = study_ct,
    ct_clst = "C66770",
    id_vars = oak_id_vars()
  )

# Map topic variable TEMP from raw variable TEMP and its qualifiers.
vs_temp <-
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "TEMP",
    tgt_var = "VSTESTCD",
    tgt_val = "TEMP",
    ct_spec = study_ct,
    ct_clst = "C66741"
  ) %>%
  dplyr::filter(!is.na(.data$VSTESTCD)) %>%
  # Map VSTEST using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "TEMP",
    tgt_var = "VSTEST",
    tgt_val = "Temperature",
    ct_spec = study_ct,
    ct_clst = "C67153",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRES using assign_no_ct algorithm
  assign_no_ct(
    raw_dat = vs_raw,
    raw_var = "TEMP",
    tgt_var = "VSORRES",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRESU using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "TEMP",
    tgt_var = "VSORRESU",
    tgt_val = "C",
    ct_spec = study_ct,
    ct_clst = "C66770",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSLOC from TEMPLOC using assign_ct
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "TEMPLOC",
    tgt_var = "VSLOC",
    ct_spec = study_ct,
    ct_clst = "C74456",
    id_vars = oak_id_vars()
  )

# Map topic variable OXYSAT from raw variable OXY_SAT and its qualifiers.
vs_oxysat <-
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "OXY_SAT",
    tgt_var = "VSTESTCD",
    tgt_val = "OXYSAT",
    ct_spec = study_ct,
    ct_clst = "C66741"
  ) %>%
  dplyr::filter(!is.na(.data$VSTESTCD)) %>%
  # Map VSTEST using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "OXY_SAT",
    tgt_var = "VSTEST",
    tgt_val = "Oxygen Saturation",
    ct_spec = study_ct,
    ct_clst = "C67153",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRES using assign_no_ct algorithm
  assign_no_ct(
    raw_dat = vs_raw,
    raw_var = "OXY_SAT",
    tgt_var = "VSORRES",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSORRESU using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "OXY_SAT",
    tgt_var = "VSORRESU",
    tgt_val = "%",
    ct_spec = study_ct,
    ct_clst = "C66770",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSLAT using assign_ct from raw variable LAT
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "LAT",
    tgt_var = "VSLAT",
    ct_spec = study_ct,
    ct_clst = "C99073",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSLOC using assign_ct from raw variable LOC
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "LOC",
    tgt_var = "VSLOC",
    ct_spec = study_ct,
    ct_clst = "C74456",
    id_vars = oak_id_vars()
  )

# Map topic variable VSALL from raw variable ASMNTDN with the logic if ASMNTDN  == 1 then VSTESTCD = VSALL
vs_vsall <-
  hardcode_ct(
    raw_dat = condition_add(vs_raw, ASMNTDN == 1L),
    raw_var = "ASMNTDN",
    tgt_var = "VSTESTCD",
    tgt_val = "VSALL",
    ct_spec = study_ct,
    ct_clst = "C66741"
  ) %>%
  dplyr::filter(!is.na(.data$VSTESTCD)) %>%
  # Map VSTEST using hardcode_ct algorithm
  hardcode_ct(
    raw_dat = vs_raw,
    raw_var = "ASMNTDN",
    tgt_var = "VSTEST",
    tgt_val = "Vital Signs",
    ct_spec = study_ct,
    ct_clst = "C67153",
    id_vars = oak_id_vars()
  )

Now that all the topic variable and its qualifier mappings are complete, combine all the datasets and proceed with mapping qualifiers, identifiers and timing variables applicable to all topic variables.

# Combine all the topic variables into a single data frame and map qualifiers
# applicable to all topic variables
vs <- dplyr::bind_rows(
  vs_vsall, vs_sysbp, vs_diabp, vs_pulse, vs_resp,
  vs_temp, vs_oxysat
) %>%
  # Map qualifiers common to all topic variables
  # Map VSDTC using assign_ct algorithm
  assign_datetime(
    raw_dat = vs_raw,
    raw_var = c("VTLD", "VTLTM"),
    tgt_var = "VSDTC",
    raw_fmt = c(list(c("d-m-y", "dd-mmm-yyyy")), "H:M")
  ) %>%
  # Map VSTPT from TMPTC using assign_ct
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "TMPTC",
    tgt_var = "VSTPT",
    ct_spec = study_ct,
    ct_clst = "TPT",
    id_vars = oak_id_vars()
  ) %>%
  # Map VSTPTNUM from TMPTC using assign_ct
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "TMPTC",
    tgt_var = "VSTPTNUM",
    ct_spec = study_ct,
    ct_clst = "TPTNUM",
    id_vars = oak_id_vars()
  ) %>%
  # Map VISIT from INSTANCE using assign_ct
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "INSTANCE",
    tgt_var = "VISIT",
    ct_spec = study_ct,
    ct_clst = "VISIT",
    id_vars = oak_id_vars()
  ) %>%
  # Map VISITNUM from INSTANCE using assign_ct
  assign_ct(
    raw_dat = vs_raw,
    raw_var = "INSTANCE",
    tgt_var = "VISITNUM",
    ct_spec = study_ct,
    ct_clst = "VISITNUM",
    id_vars = oak_id_vars()
  )
oak_id raw_source patient_number VSTESTCD VSTEST VSORRES VSORRESU VSPOS VSLAT VSDTC VSTPT VSTPTNUM VISIT VISITNUM
1 vitals 375 SYSBP Systolic Blood Pressure 158.00 mmHg PRONE NA 2015-05-16T07:25 PREDOSE 1 VISIT1 VISIT1
1 vitals 375 DIABP Diastolic Blood Pressure 92.00 mmHg PRONE NA 2015-05-16T07:25 PREDOSE 1 VISIT1 VISIT1
1 vitals 375 PULSE Pulse Rate 63.00 beats/min NA NA 2015-05-16T07:25 PREDOSE 1 VISIT1 VISIT1
1 vitals 375 RESP Respiratory Rate 17.00 breaths/min NA NA 2015-05-16T07:25 PREDOSE 1 VISIT1 VISIT1
1 vitals 375 TEMP Temperature 40.48 C NA NA 2015-05-16T07:25 PREDOSE 1 VISIT1 VISIT1
1 vitals 375 OXYSAT Oxygen Saturation 98.00 % NA RIGHT 2015-05-16T07:25 PREDOSE 1 VISIT1 VISIT1
2 vitals 375 SYSBP Systolic Blood Pressure 94.00 mmHg SEMI-RECUMBENT NA 2015-05-16T10:25 POSTDOSE 2 VISIT1 VISIT1
2 vitals 375 DIABP Diastolic Blood Pressure 78.00 mmHg SEMI-RECUMBENT NA 2015-05-16T10:25 POSTDOSE 2 VISIT1 VISIT1
2 vitals 375 PULSE Pulse Rate 76.00 beats/min NA NA 2015-05-16T10:25 POSTDOSE 2 VISIT1 VISIT1
2 vitals 375 RESP Respiratory Rate 20.00 breaths/min NA NA 2015-05-16T10:25 POSTDOSE 2 VISIT1 VISIT1

Create SDTM derived variables

Create derived variables applicable to all topic variables.

vs <- vs %>%
  dplyr::mutate(
    STUDYID = "test_study",
    DOMAIN = "VS",
    VSCAT = "VITAL SIGNS",
    USUBJID = paste0("test_study", "-", .data$patient_number)
  ) %>%
  # derive_seq(tgt_var = "VSSEQ",
  #            rec_vars= c("USUBJID", "VSTRT")) %>%
  derive_study_day(
    sdtm_in = .,
    dm_domain = dm,
    tgdt = "VSDTC",
    refdt = "RFXSTDTC",
    study_day_var = "VSDY"
  ) %>%
  dplyr::select("STUDYID", "DOMAIN", "USUBJID", everything())
STUDYID DOMAIN USUBJID VSTESTCD VSTEST VSORRES VSORRESU VSPOS VSLAT VSTPT VSTPTNUM VISIT VISITNUM VSDTC VSDY
test_study VS test_study-375 SYSBP Systolic Blood Pressure 158.00 mmHg PRONE NA PREDOSE 1 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 DIABP Diastolic Blood Pressure 92.00 mmHg PRONE NA PREDOSE 1 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 PULSE Pulse Rate 63.00 beats/min NA NA PREDOSE 1 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 RESP Respiratory Rate 17.00 breaths/min NA NA PREDOSE 1 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 TEMP Temperature 40.48 C NA NA PREDOSE 1 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 OXYSAT Oxygen Saturation 98.00 % NA RIGHT PREDOSE 1 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 SYSBP Systolic Blood Pressure 94.00 mmHg SEMI-RECUMBENT NA POSTDOSE 2 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 DIABP Diastolic Blood Pressure 78.00 mmHg SEMI-RECUMBENT NA POSTDOSE 2 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 PULSE Pulse Rate 76.00 beats/min NA NA POSTDOSE 2 VISIT1 VISIT1 2015-05-16 -2890
test_study VS test_study-375 RESP Respiratory Rate 20.00 breaths/min NA NA POSTDOSE 2 VISIT1 VISIT1 2015-05-16 -2890

Add Labels and Attributes

Yet to be developed.