This vignette describes the process for creating a concept database and SNOMED composition lookup for used with named entity recognition.
Before using the code in this vignette, please follow the vignette ‘Using SNOMED dictionaries and codelists’ to download the NHS SNOMED distribution and create a R SNOMED dictionary.
MedCAT is a named entity recognition and linking system (NER+L) that uses supervised training on a large corpus of texts to learn the context surrounding definitive mentions of clinical concepts, and uses the context information to disambiguate between different meanings of acronyms or ambiguous terms. This package creates a concept database file in MedCAT format.
MiADE is a natural language processing system for real time extraction of structured information from clinical notes. MiADE incorporates MedCAT NER+L to select SNOMED CT concepts, with pre-procesing (paragraph chunking) and post-processing (conversion of suspected, negated and historic concepts, and filtering). This package creates MiADE lookups for converting suspected, negated and historic concepts to precoordinated SNOMED CT concepts.
To create the MedCAT and MiADE lookups, the steps are:
A future version of MiADE will be able to detect diagnosis attributes such as severity and body site separately to the pathology, and then combine them into the most precise and accurate SNOMED CT concept available.
The first stage is to create ‘decompositions’ of SNOMED CT concepts, which uses the SNOMED CT concept model as well as text parsing to decompose a concept into components in a number of different ways. Example code is given below.
library(Rdiagnosislist)
# Load the SNOMED dictionary (for this example we are using the
# sample included with the package)
SNOMED <- sampleSNOMED()
# Create a concept database environment
miniCDB <- createCDB(SNOMED = SNOMED)
## Initialising causes.
## Creating transitive closure table.
## Transitive closure table created, 5094 rows.
## Initialising findings and qualifiers.
## Initialising body structures.
## Creating severity and stage lists.
## Creating lists of lateralised structures.
## Creating indices for fast searching
##
## Decomposing 83291003: cor pulmonale
##
## After splitting by parts
## parttext rootId with due_to after without body_site severity stage
## <char> <i64> <i64> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: <NA> 83291003 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## laterality other_attr text
## <i64> <char> <char>
## 1: <NA> @ @ cor pulmonale
##
## No valid ancestors, trying morphologies
##
## No valid morphologies
##
## Finding ancestors or morphologies
## parttext rootId with due_to after without body_site severity stage
## <char> <i64> <i64> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: cor pulmonale 83291003 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## laterality other_attr text partId
## <i64> <char> <char> <i64>
## 1: <NA> @ @ cor pulmonale 83291003
##
## Unable to extract body site for this disorder
##
## Finding causes
## parttext rootId with due_to after without body_site severity stage
## <char> <i64> <i64> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: cor pulmonale 83291003 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## laterality other_attr text partId
## <i64> <char> <char> <i64>
## 1: <NA> @ @ cor pulmonale 83291003
##
## Finding other attributes
## parttext rootId with due_to after without body_site severity stage
## <char> <i64> <i64> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: cor pulmonale 83291003 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## laterality other_attr text partId other_conceptId
## <i64> <char> <char> <i64> <char>
## 1: <NA> @ @ cor pulmonale 83291003
##
## Decomposing 83291003: right heart failure due to disorder of lung
## Splitting due_to at due to|caused by
##
## After splitting by parts
## parttext rootId with due_to after without body_site severity stage
## <char> <i64> <i64> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: <NA> 128404006 <NA> 19829001 <NA> <NA> <NA> <NA> <NA>
## laterality other_attr text
## <i64> <char> <char>
## 1: <NA> @ @ @ right heart failure
##
## Finding ancestors or morphologies
## parttext rootId with due_to after without body_site
## <char> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: right heart failure 84114007 <NA> 19829001 <NA> <NA> <NA>
## 2: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 3: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 4: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 5: right heart failure 367363000 <NA> 19829001 <NA> <NA> <NA>
## 6: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## severity stage laterality other_attr text partId
## <i64> <i64> <i64> <char> <char> <i64>
## 1: <NA> <NA> <NA> right @ @ heart failure 128404006
## 2: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 3: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 4: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 5: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 6: <NA> <NA> <NA> @ @ @ right heart failure 128404006
##
## Unable to extract body site for this disorder
##
## Finding causes
## parttext rootId with due_to after without body_site
## <char> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: right heart failure 84114007 <NA> 19829001 <NA> <NA> <NA>
## 2: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 3: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 4: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 5: right heart failure 367363000 <NA> 19829001 <NA> <NA> <NA>
## 6: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## severity stage laterality other_attr text partId
## <i64> <i64> <i64> <char> <char> <i64>
## 1: <NA> <NA> <NA> right @ @ heart failure 128404006
## 2: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 3: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 4: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 5: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 6: <NA> <NA> <NA> @ @ @ right heart failure 128404006
##
## Finding other attributes
## parttext rootId with due_to after without body_site
## <char> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: right heart failure 84114007 <NA> 19829001 <NA> <NA> <NA>
## 2: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 3: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 4: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 5: right heart failure 367363000 <NA> 19829001 <NA> <NA> <NA>
## 6: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## severity stage laterality other_attr text partId
## <i64> <i64> <i64> <char> <char> <i64>
## 1: <NA> <NA> <NA> right @ @ heart failure 128404006
## 2: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 3: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 4: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 5: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 6: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## other_conceptId
## <char>
## 1: right
## 2:
## 3:
## 4:
## 5:
## 6:
##
## Decomposing 83291003: right heart failure due to pulmonary disease
## Splitting due_to at due to|caused by
##
## After splitting by parts
## parttext rootId with due_to after without body_site severity stage
## <char> <i64> <i64> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: <NA> 128404006 <NA> 19829001 <NA> <NA> <NA> <NA> <NA>
## laterality other_attr text
## <i64> <char> <char>
## 1: <NA> @ @ @ right heart failure
##
## Finding ancestors or morphologies
## parttext rootId with due_to after without body_site
## <char> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: right heart failure 84114007 <NA> 19829001 <NA> <NA> <NA>
## 2: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 3: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 4: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 5: right heart failure 367363000 <NA> 19829001 <NA> <NA> <NA>
## 6: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## severity stage laterality other_attr text partId
## <i64> <i64> <i64> <char> <char> <i64>
## 1: <NA> <NA> <NA> right @ @ heart failure 128404006
## 2: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 3: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 4: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 5: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 6: <NA> <NA> <NA> @ @ @ right heart failure 128404006
##
## Unable to extract body site for this disorder
##
## Finding causes
## parttext rootId with due_to after without body_site
## <char> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: right heart failure 84114007 <NA> 19829001 <NA> <NA> <NA>
## 2: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 3: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 4: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 5: right heart failure 367363000 <NA> 19829001 <NA> <NA> <NA>
## 6: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## severity stage laterality other_attr text partId
## <i64> <i64> <i64> <char> <char> <i64>
## 1: <NA> <NA> <NA> right @ @ heart failure 128404006
## 2: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 3: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 4: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 5: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 6: <NA> <NA> <NA> @ @ @ right heart failure 128404006
##
## Finding other attributes
## parttext rootId with due_to after without body_site
## <char> <i64> <i64> <i64> <i64> <i64> <i64>
## 1: right heart failure 84114007 <NA> 19829001 <NA> <NA> <NA>
## 2: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 3: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 4: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## 5: right heart failure 367363000 <NA> 19829001 <NA> <NA> <NA>
## 6: right heart failure 128404006 <NA> 19829001 <NA> <NA> <NA>
## severity stage laterality other_attr text partId
## <i64> <i64> <i64> <char> <char> <i64>
## 1: <NA> <NA> <NA> right @ @ heart failure 128404006
## 2: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 3: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 4: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 5: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## 6: <NA> <NA> <NA> @ @ @ right heart failure 128404006
## other_conceptId
## <char>
## 1: right
## 2:
## 3:
## 4:
## 5:
## 6:
##
## --------------------------------------------------------------------------------
## 83291003 | Cor pulmonale (disorder)
## --------------------------------------------------------------------------------
## Root : 128404006 | Right heart failure (disorder)
## - Due to : 19829001 | Disorder of lung (disorder)
##
## --------------------------------------------------------------------------------
## 83291003 | Cor pulmonale (disorder)
## --------------------------------------------------------------------------------
## Root : 367363000 | Right ventricular failure (disorder)
## - Due to : 19829001 | Disorder of lung (disorder)
##
## --------------------------------------------------------------------------------
## 83291003 | Cor pulmonale (disorder)
## --------------------------------------------------------------------------------
## Root : 128404006 | Right heart failure (disorder)
## - Due to : 19829001 | Disorder of lung (disorder)
##
## --------------------------------------------------------------------------------
## 83291003 | Cor pulmonale (disorder)
## --------------------------------------------------------------------------------
## Root : 367363000 | Right ventricular failure (disorder)
## - Due to : 19829001 | Disorder of lung (disorder)
To create the composition lookups, the steps are:
The compose lookup table can now be used to refine SNOMED CT concepts using compose(), which selects a more specific concept based on supplied attributes.
Example code:
# Create SNOMED and CDB
SNOMED <- loadSNOMED(path_to_snomed)
CDB <- createCDB(SNOMED)
# Select SNOMED CT concepts to decompose
disorders <- descendants('Disorder', SNOMED = SNOMED)
# Batch decomposition
batchDecompose(disorders, CDB = CDB, SNOMED = SNOMED,
output_filename = 'path_to_decompositions.csv')
# Create composition lookup
CL <- createComposeLookup('path_to_decompositions.csv',
CDB = CDB, SNOMED = SNOMED)
# Test the decomoposition table to refine a SNOMED CT concept
compose(conceptId = as.SNOMEDconcept('Fracture'),
CDB = CDB, composeLookup = CL,
attributes_conceptIds = as.SNOMEDconcept(c('Open', 'Femur')),
due_to_conceptIds = bit64::integer64(0),
without_conceptIds = bit64::integer64(0),
with_conceptIds = bit64::integer64(0),
SNOMED = SNOMED)
The following attributes can be supplied to compose():
For more information about SNOMED CT, visit the SNOMED CT international website: https://www.snomed.org/
SNOMED CT (UK edition) can be downloaded from the NHS Digital site: https://isd.digital.nhs.uk/trud/user/guest/group/0/home
The NHS Digital terminology browser can be used to search for terms interactively: https://termbrowser.nhs.uk/
For more information about MiADE, visit https://www.ucl.ac.uk/health-informatics/research/miade/miade-software-and-availability
For more information about MedCAT, visit https://github.com/CogStack/MedCAT