A “snake_case” filter system to R.
if (!requireNamespace("remotes")) {
  install.packages("remotes")
}
remotes::install_github(
  repo = "openpharma/filters",
  upgrade = "never"
)library(filters)
library(magrittr)
library(random.cdisc.data)
library(rtables)
library(tern)
set.seed(1)
adsl <- radsl()
adae <- radae(adsl)
vads <- list(adsl = adsl, adae = adae){filters} comes with a built-in filter library. You can
list them using list_all_filters().
list_all_filters()# A tibble: 272 x 4
   id     title                      target condition                      
   <chr>  <chr>                      <chr>  <chr>                          
 1 COV    Confirmed/Suspected COVID… ADAE   ACOVFL == 'Y'                  
 2 COVAS  AEs Associated with COVID… ADAE   ACOVASFL == 'Y'                
 3 CTC35  Grade 3-5 Adverse Events   ADAE   ATOXGR %in% c('3', '4', '5')   
 4 DSC    Adverse Events Leading to… ADAE   AEACN == 'DRUG WITHDRAWN'      
 5 DSM    Adverse Events Leading to… ADAE   AEACN %in% c('DOSE INCREASED',…
 6 FATAL  Fatal Adverse Events       ADAE   AESDTH == 'Y'                  
 7 NCOV   Excluding Confirmed/Suspe… ADAE   ACOVFL != 'Y'                  
 8 NCOVAS AEs not Associated with C… ADAE   ACOVASFL != 'Y'                
 9 NFATAL Non-fatal Adverse Events   ADAE   AESDTH == 'N'                  
10 NREL   Adverse Events not Relate… ADAE   AREL == 'N'                    
# … with 262 more rowsTo add a new filter use add_filter(). The last argument,
condition, defines the condition to use to filter the
datasets later on. It will be passed to subset() when
calling apply_filter().
add_filter(
  id = "CTC34",
  title = "Grade 3-4 Adverse Events",
  target = "ADAE",
  condition = AETOXGR %in% c("4", "5")
)Alternatively, you can use load_filters() to load filter
definitions from a yaml file. The file should be structured like
this:
CTC4:
  title: Grade 4 Adverse Events
  target: ADAE
  condition: ATOXGR == "4"
TP53WT:
  title: TP53 Wild Type
  target: ADSL
  condition: TP53 == "WILD TYPE"file_path <- system.file("filters_eg.yaml", package = "filters")
load_filters(file_path)You can confirm that filters haven been successfully added by using
get_filter().
get_filter("CTC34")$title
[1] "Grade 3-4 Adverse Events"
$target
[1] "ADAE"
$condition
AETOXGR %in% c("4", "5")If you ask for a non-existing filter get_filter() will
throw an error.
get_filter("GIDIS")Error: Filter 'GIDIS' does not exist.To overwrite an existing filter you will have to set
overwrite = TRUE. Otherwise an error is thrown.
add_filter(
  id = "FATAL",
  title = "Fatal Adverse Events",
  target = "ADAE",
  condition = ATOXGR == "5"
)Error: Filter 'FATAL' already exists. Set `overwrite = TRUE` to force overwriting the existing filter definition.add_filter(
  id = "FATAL",
  title = "Fatal Adverse Events",
  target = "ADAE",
  condition = ATOXGR == "5",
  overwrite = TRUE
)You can use apply_filter() to filter a single dataset or
a list of multiple datasets.
adsl_se <- apply_filter(adsl, "SE")Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.adae_ctc34_ser <- apply_filter(adae, "CTC34_SER")Filters 'CTC34', 'SER' matched target ADAE.
216/1967 records matched the filter condition `AETOXGR %in% c('4', '5') & AESER == 'Y'`.filtered_datasets <- apply_filter(vads, "CTC34_SER_SE")Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
Filters 'CTC34', 'SER' matched target ADAE.
216/1967 records matched the filter condition `AETOXGR %in% c('4', '5') & AESER == 'Y'`.As you can see apply_filter() gives you feedback on
which IDs matched the dataset. This matching is done by the name of the
input dataset. It does not matter whether the dataset name is in upper
or lower case or a mix of both.
ADSL <- adsl
adsl_it <- apply_filter(ADSL, "IT")Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.In case your dataset is not named in a standard way you can manually
tell apply_filter() which dataset it is by setting the
target argument.
sl <- adsl
sl_it1 <- apply_filter(sl, "IT")No filter matched target SL.sl_it2 <- apply_filter(sl, "IT", target = "ADSL")Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.{filters} package works well with {rtables}
and {tern} packages. See the following example of creating
a table by a function:
t_ae <- function(datasets) {
  anl <- merge(
    x = datasets$adsl,
    y = datasets$adae,
    by = c("STUDYID", "USUBJID"),
    all = FALSE, # inner join
    suffixes = c("", "_ADAE")
  )
  
  split_fun <- drop_split_levels
  lyt <- basic_table(show_colcounts = TRUE) %>%
  split_cols_by(var = "ARM") %>%
  add_overall_col(label = "All Patients") %>%
  analyze_num_patients(
    vars = "USUBJID",
    .stats = c("unique", "nonunique"),
    .labels = c(
      unique = "Total number of patients with at least one adverse event",
      nonunique = "Overall total number of events"
    )
  ) %>%
  split_rows_by(
    "AEBODSYS",
    child_labels = "visible",
    nested = FALSE,
    split_fun = split_fun,
    label_pos = "topleft",
    split_label = obj_label(adae$AEBODSYS)
  ) %>%
  summarize_num_patients(
    var = "USUBJID",
    .stats = c("unique", "nonunique"),
    .labels = c(
      unique = "Total number of patients with at least one adverse event",
      nonunique = "Total number of events"
    )
  ) %>%
  count_occurrences(
    vars = "AEDECOD",
    .indent_mods = -1L
  ) %>%
  append_varlabels(adae, "AEDECOD", indent = 1L)
  result <- build_table(
    lyt,
    df = datasets$adae,
    alt_counts_df = datasets$adsl
  )
  return(result)
}You can easily create multiple outputs with this function by applying
the filters to the input datasets before passing them to
t_ae().
vads %>% apply_filter("SE") %>% t_ae()Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.Body System or Organ Class                                    A: Drug X    B: Placebo    C: Combination   All Patients
  Dictionary-Derived Term                                      (N=133)       (N=141)        (N=126)         (N=400)   
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Total number of patients with at least one adverse event     111 (83.5%)   132 (93.6%)    119 (94.4%)     362 (90.5%) 
Overall total number of events                                   636           755            655             2046    
cl A.1                                                                                                                
  Total number of patients with at least one adverse event   63 (47.4%)    79 (56.0%)      71 (56.3%)     213 (53.2%) 
  Total number of events                                         123           144            133             400     
  dcd A.1.1.1.1                                              47 (35.3%)    63 (44.7%)      50 (39.7%)     160 (40.0%) 
  dcd A.1.1.1.2                                              42 (31.6%)    47 (33.3%)      44 (34.9%)     133 (33.2%) 
cl B.1                                                                                                                
  Total number of patients with at least one adverse event   47 (35.3%)    49 (34.8%)      59 (46.8%)     155 (38.8%) 
  Total number of events                                         73            63              75             211     
  dcd B.1.1.1.1                                              47 (35.3%)    49 (34.8%)      59 (46.8%)     155 (38.8%) 
cl B.2                                                                                                                
  Total number of patients with at least one adverse event   73 (54.9%)    88 (62.4%)      73 (57.9%)     234 (58.5%) 
  Total number of events                                         132           156            137             425     
  dcd B.2.1.2.1                                              44 (33.1%)    56 (39.7%)      50 (39.7%)     150 (37.5%) 
  dcd B.2.2.3.1                                              48 (36.1%)    59 (41.8%)      44 (34.9%)     151 (37.8%) 
cl C.1                                                                                                                
  Total number of patients with at least one adverse event   50 (37.6%)    53 (37.6%)      42 (33.3%)     145 (36.2%) 
  Total number of events                                         62            75              62             199     
  dcd C.1.1.1.3                                              50 (37.6%)    53 (37.6%)      42 (33.3%)     145 (36.2%) 
cl C.2                                                                                                                
  Total number of patients with at least one adverse event   50 (37.6%)    65 (46.1%)      50 (39.7%)     165 (41.2%) 
  Total number of events                                         67            87              63             217     
  dcd C.2.1.2.1                                              50 (37.6%)    65 (46.1%)      50 (39.7%)     165 (41.2%) 
cl D.1                                                                                                                
  Total number of patients with at least one adverse event   74 (55.6%)    95 (67.4%)      72 (57.1%)     241 (60.2%) 
  Total number of events                                         120           158            112             390     
  dcd D.1.1.1.1                                              37 (27.8%)    59 (41.8%)      35 (27.8%)     131 (32.8%) 
  dcd D.1.1.4.2                                              54 (40.6%)    63 (44.7%)      48 (38.1%)     165 (41.2%) 
cl D.2                                                                                                                
  Total number of patients with at least one adverse event   43 (32.3%)    54 (38.3%)      56 (44.4%)     153 (38.2%) 
  Total number of events                                         59            72              73             204     
  dcd D.2.1.5.3                                              43 (32.3%)    54 (38.3%)      56 (44.4%)     153 (38.2%) vads %>% apply_filter("SER_SE") %>% t_ae()Filter 'SE' matched target ADSL.
400/400 records matched the filter condition `SAFFL == 'Y'`.
Filter 'SER' matched target ADAE.
581/1967 records matched the filter condition `AESER == 'Y'`.Body System or Organ Class                                   A: Drug X    B: Placebo    C: Combination   All Patients
  Dictionary-Derived Term                                     (N=133)       (N=141)        (N=126)         (N=400)   
—————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
Total number of patients with at least one adverse event     93 (69.9%)   110 (78.0%)     98 (77.8%)     301 (75.2%) 
Overall total number of events                                  248           280            246             774     
cl A.1                                                                                                               
  Total number of patients with at least one adverse event   42 (31.6%)   47 (33.3%)      44 (34.9%)     133 (33.2%) 
  Total number of events                                         54           63              58             175     
  dcd A.1.1.1.2                                              42 (31.6%)   47 (33.3%)      44 (34.9%)     133 (33.2%) 
cl B.1                                                                                                               
  Total number of patients with at least one adverse event   47 (35.3%)   49 (34.8%)      59 (46.8%)     155 (38.8%) 
  Total number of events                                         73           63              75             211     
  dcd B.1.1.1.1                                              47 (35.3%)   49 (34.8%)      59 (46.8%)     155 (38.8%) 
cl B.2                                                                                                               
  Total number of patients with at least one adverse event   48 (36.1%)   59 (41.8%)      44 (34.9%)     151 (37.8%) 
  Total number of events                                         74           78              65             217     
  dcd B.2.2.3.1                                              48 (36.1%)   59 (41.8%)      44 (34.9%)     151 (37.8%) 
cl D.1                                                                                                               
  Total number of patients with at least one adverse event   37 (27.8%)   59 (41.8%)      35 (27.8%)     131 (32.8%) 
  Total number of events                                         47           76              48             171     
  dcd D.1.1.1.1                                              37 (27.8%)   59 (41.8%)      35 (27.8%)     131 (32.8%) The filters you created using add_filter() only persist
for the duration of your R session. That means that
whenever you restart your R session you will have to
re-create them. The simplest way to do so is by putting all your filter
definitions inside a file filters.yml file as described
above and call load_filters("path/to/filters.yml") before
creating outputs.
If you pass an existing filter that does not match your target
dataset no warning or error is thrown. Instead
apply_filter() only tells you which filters it actually
used. Thus, checking that only valid filters are passed to
apply_filter() is up to you.
add_filter(
  id = "INFCT",
  title = "Infections and Infestations",
  target = "ADAE",
  condition = AEBODSYS == "INFECTIONS AND INFESTATIONS"
)
adsl_filtered <- apply_filter(adsl, "DIABP_IT")Filter 'IT' matched target ADSL.
400/400 records matched the filter condition `ITTFL == 'Y'`.Internally, {filters} stores the filter definitions
inside the .filters environment defined in
R/zzz.R. When you add a filter with
add_filter() a new variable with the name of the ID is
created inside this environment. This variable is a list that stores the
title, target and condition as a quoted expression. When you use
apply_filter() the function looks for variables in
.filters matching the provided suffixes. It then maps the
filters to their target datasets and finally builds a call to
subset() with the dataset as first and condition for the
filters as second argument. This call is then evaluated using
eval() and the result is returned.