--- title: "Manipulate codelists" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{a08_ManipulateCodelists} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = NOT_CRAN) ``` ```{r, include = FALSE} CDMConnector::requireEunomia("synpuf-1k", "5.3") ``` ## Introduction: Manipulate codelists This vignette introduces a set of functions designed to manipulate and explore codelists within an OMOP CDM. Specifically, we will learn how to: - **Subset a codelist** to keep only codes meeting a certain criteria. - **Stratify a codelist** based on attributes like dose unit or route of administration. - **Add or exclude** concepts from a codelists. - **Compare two codelists** to identify shared and unique concepts. First of all, we will load the required packages and connect to a mock database. ```{r, warning=FALSE, message=FALSE} library(DBI) library(duckdb) library(dplyr) library(CDMConnector) library(CodelistGenerator) # Download mock database requireEunomia(datasetName = "synpuf-1k", cdmVersion = "5.3") # Connect to the database and create the cdm object con <- dbConnect(duckdb(), eunomiaDir("synpuf-1k", "5.3")) cdm <- cdmFromCon(con = con, cdmName = "Eunomia Synpuf", cdmSchema = "main", writeSchema = "main", achillesSchema = "main") ``` We will start by generating a codelist for *acetaminophen* using `getDrugIngredientCodes()` ```{r, warning=FALSE, message=FALSE} acetaminophen <- getDrugIngredientCodes(cdm, name = "acetaminophen", nameStyle = "{concept_name}", type = "codelist") ``` ### Subsetting a Codelist Subsetting a codelist will allow us to reduce a codelist to only those concepts that meet certain conditions. #### Subset by Domain We will now subset to those concepts that have `domain = "Drug"`. Remember that, to see the domains available in your codelist, you can use `associatedDomains()`. ```{r, warning=FALSE, messages=FALSE} acetaminophen_drug <- subsetOnDomain(acetaminophen, cdm, domain = "Drug") acetaminophen_drug ``` We can use the `negate` argument to exclude concepts with a certain domain: ```{r, warning=FALSE, messages=FALSE} acetaminophen_no_drug <- subsetOnDomain(acetaminophen, cdm, domain = "Drug", negate = TRUE) acetaminophen_no_drug ``` #### Subset on vocabulary We will now subset the codelist to only include concepts from RxNorm vocabulary. You can also use `associatedVocabularies()` to explore the vocabularies available in your codelist. ```{r, warning=FALSE, messages=FALSE} acetaminophen_rxnorm <- subsetOnVocabulary(acetaminophen_drug, cdm, c("RxNorm")) acetaminophen_rxnorm ``` #### Subset on Dose Unit We will now filter to only include concepts with specified dose units. Remember that you can use `associatedDoseUnits()` to explore the dose units available in your codelist. ```{r, warning=FALSE, messages=FALSE} acetaminophen_mg_unit <- subsetOnDoseUnit(acetaminophen_rxnorm, cdm, c("milligram", "unit")) acetaminophen_mg_unit ``` As before, we can use argument `negate = TRUE` to exclude instead. #### Subset on ingredient range We can now subset on those drugs with 3 to 30 ingredients: ```{r, warning=FALSE, messages=FALSE} acetaminophen_ingredient <- subsetOnIngredientRange(acetaminophen_drug, cdm, ingredientRange = c(3, 30)) acetaminophen_ingredient ``` Notice that `negate = TRUE` would keep all those concepts with less than 3 ingredients or more than 30 (without including those with 3 or 30 ingredients). #### Subset on route category We will now subset to those concepts that do not have an "unclassified_route" or "transmucosal_rectal". See `associatedRouteCategories()` to explore route categories available in your codelist. ```{r, warning=FALSE, messages=FALSE} acetaminophen_route <- subsetOnRouteCategory(acetaminophen_mg_unit, cdm, c("transmucosal_rectal","unclassified_route"), negate = TRUE) acetaminophen_route ``` #### Subset on dose forms We will now subset to those concepts with specific dose forms. See `associatedDseForms()` to explore dose forms available in your codelist. ```{r, warning=FALSE, messages=FALSE} # First, check which dose forms are available in our codelist acetaminophen_drug |> associatedDoseForms(cdm) acetaminophen_oral <- subsetOnDoseForm(acetaminophen_drug, cdm, c("Oral Solution","Oral Capsule")) acetaminophen_oral ``` ### Stratify codelist Instead of filtering, stratification allows us to split a codelist into subgroups based on defined vocabulary properties. #### Stratify by Dose Unit ```{r, warning=FALSE, messages=FALSE} acetaminophen_doses <- stratifyByDoseUnit(acetaminophen, cdm, keepOriginal = TRUE) acetaminophen_doses ``` #### Stratify by Route Category ```{r, warning=FALSE, messages=FALSE} acetaminophen_routes <- stratifyByRouteCategory(acetaminophen, cdm) acetaminophen_routes ``` #### Stratify by Dose Form ```{r, warning=FALSE, messages=FALSE} acetaminophen_dose_forms <- stratifyByDoseForm(acetaminophen, cdm) acetaminophen_dose_forms ``` ## Add or remove concepts from a codelist We can also add specific concepts to our codelist. For example, we will add the ingredient "acetaminophen" to all our codelists: ```{r, warning=FALSE, messages=FALSE} acetaminophen_routes1 <- addConcepts(acetaminophen_routes, cdm, concepts = c(1125315L)) acetaminophen_routes1 ``` Or we can add acetaminophen + descendants, and only to some of the codelists ```{r, warning=FALSE, messages=FALSE} x <- getDescendants(cdm = cdm, conceptId = c(1125315L)) acetaminophen_routes2 <- addConcepts(acetaminophen_routes, cdm, concepts = x$concept_id, codelistName = "acetaminophen_unclassified_route_category") acetaminophen_routes2 ``` And similarly, we can exclude specific concepts and their descendants from our codelist: ```{r, warning=FALSE, messages=FALSE} acetaminophen_routes3 <- excludeConcepts(acetaminophen_routes, cdm, concepts = x$concept_id, codelistName = "acetaminophen_inhalable") acetaminophen_routes3 ``` Notice that in this case, the codelist "acetaminophen_inhalable" is removed as there are no elements left after the exclusion of the code 35873016 and its descendants. ### Codelist construction Notice that all the functions introduced previously are "pipeble", allowing for a tidy and clear codelist construction: ```{r, warning=FALSE, messages=FALSE} acetaminophen <- getDrugIngredientCodes(cdm, name = "acetaminophen", nameStyle = "{concept_name}", type = "codelist") new_codelist <- acetaminophen |> addConcepts(cdm, concepts = c(1L, 2L, 3L)) |> subsetOnDomain(cdm, domain = "Drug") |> stratifyByDoseUnit(cdm = cdm) |> excludeConcepts(cdm, concepts = c(1127898)) new_codelist ``` ### Compare codelists Now we will compare two codelists to identify overlapping and unique codes. ```{r, warning=FALSE, messages=FALSE} acetaminophen <- getDrugIngredientCodes(cdm, name = "acetaminophen", nameStyle = "{concept_name}", type = "codelist_with_details") hydrocodone <- getDrugIngredientCodes(cdm, name = "hydrocodone", doseUnit = "milligram", nameStyle = "{concept_name}", type = "codelist_with_details") ``` Compare the two sets: ```{r} comparison <- compareCodelists(acetaminophen, hydrocodone) comparison |> glimpse() comparison |> filter(codelist == "Both") ```