--- title: "Characterizing Accuracy of Model Predictions for Concentration in High Throughput Screening Assays" author: "Meredith N. Scherer, Katie Paul-Friedman, and John F. Wambaugh" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{h) Scherer (Submitted): In Vitro Distribution} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- *Please send questions to wambaugh.john@epa.gov* from "Characterizing Accuracy of Model Predictions for Concentration in High Throughput Screening Assays" Meredith Scherer, Madison Feshuk, James M. Armitage, Jon A. Arnot, David Crizer, Michael Devito, Stephen S. Ferguson, Kimberly Freeman, Josh Harrill, Nynke Kramer, Alessandro Sangion, Caroline Stevens, Katie Paul Friedman, John F. Wambaugh ### update w/paper info + link to cleared dataset ## Abstract The aim of toxicological in vitro - in vivo extrapolation (IVIVE) is to predict whole animal effects from perturbations observed in vitro. Reliance on nominal (administered) concentration does not account for chemical distribution to cellular and non-cellular elements of the in vitro assay system. This chemical distribution reduces the free concentration available to cause effects and can result in an inaccurate estimate of the intracellular concentration causing any observed perturbations. There are mathematical, chemical property-based partitioning models for predicting cellular concentrations when not measured experimentally. This work evaluated two of these in vitro disposition models: Armitage et al., 2014 and Kramer et al., 2010 using a total of 153 experimental intracellular concentration measurements of 43 chemicals, along with parameters describing measurement conditions, from 12 peer reviewed studies in addition to data generated for this study. Intracellular concentrations were more accurately predicted by both the Armitage model (root mean squared log10 error (RMSLE) = 1.12) and the Kramer model (RMSLE = 1.30) than by the nominal concentration (RMSLE = 1.45). Although limited by the amount of available measurement data, these results indicate that mathematical modeling of in vitro distribution can improve the accuracy of IVIVE. ## HTTK Version This vignette was created with **httk** v2.X.0. Although we attempt to maintain backward compatibility, if you encounter issues with the latest release of **httk** and cannot easily address the changes, historical versions of httk are available from: https://cran.r-project.org/src/contrib/Archive/httk/ ## Prepare for session R package **knitr** generates html and PDF documents from this RMarkdown file, Each bit of code that follows is known as a "chunk". We start by telling **knitr** how we want our chunks to look. ```{r configure_knitr, eval = TRUE} knitr::opts_chunk$set(collapse = TRUE, comment = '#>') options(rmarkdown.html_vignette.check_title = FALSE) ``` ### Clear the memory It is a bad idea to let variables and other information from previous R sessions float around, so we first remove everything in the R memory. ```{r clear_memory, eval = TRUE} rm(list=ls()) ``` ### eval = execute.vignette If you are using the RMarkdown version of this vignette (extension, .RMD) you will be able to see that several chunks of code in this vignette have the statement "eval = execute.vignette". The next chunk of code, by default, sets execute.vignette = FALSE. This means that the code is included (and necessary) but was not run when the vignette was built. We do this because some steps require extensive computing time and the checks on CRAN limit how long we can spend building the package. If you want this vignette to work, you must run all code, either by cutting and pasting it into R. Or, if viewing the .RMD file, you can either change execute.vignette to TRUE or press "play" (the green arrow) on each chunk in RStudio. ```{r runchunks, eval = TRUE} # Set whether or not the following chunks will be executed (run): execute.vignette <- FALSE ``` ### Load the relevant libraries We use the command 'library()' to load various R packages for our analysis. If you get the message "Error in library(X) : there is no package called 'X'" then you will need to install that package: From the R command prompt: install.packages("X") Or, if using RStudio, look for 'Install Packages' under 'Tools' tab. ```{r load_packages, eval = execute.vignette} # # # Setup # # library(httk) # run Armitage.R in vitro distribution model library(tidyverse) # data wrangling library(data.table) # data table functionality library(ggplot2) # plotting library(ggrepel) # jitter labels in plots library(ggpubr) # arrange plots for publication library(viridis) # colorblind palette library(scales) # required for label_log #save figures and tables to working directory: yes/no save_output = FALSE #for saving outputs data.date <- "16JUL2025" path_out <- "C:/Users/mscherer/OneDrive - Environmental Protection Agency (EPA) /Profile/Desktop/IVDManuscriptData/manuscript figures 16JUL25/" ``` ```{r rmsle_func, eval = execute.vignette} #function to calculate RMSLE sle <- function (actual, predicted) { log.act <- log10(actual) log.pred <- log10(predicted) good <- !is.na(log.act) & is.finite(log.act) & !is.na(log.pred) & is.finite(log.pred) log.act <- log.act[good] log.pred <- log.pred[good] return((log.act - log.pred)^2) } msle <- function (actual, predicted) { mean(sle(actual, predicted)) } rmsle <-function (actual, predicted) { sqrt(msle(actual, predicted)) } ``` # Data Collection Data sets were curated from the literature and combined with two in house experiments to evaluate the in vitro distribution models. # Load and format the data ```{r load_IVD_Manuscript_data, eval = execute.vignette} IVD_Manuscript_data <- httk::Scherer2025.IVD cat(paste("Summarized data from ", length(unique(IVD_Manuscript_data$Citation)), " studies covering ", length(unique(IVD_Manuscript_data$ChemicalName)), " unique chemicals ", "with ", length(IVD_Manuscript_data$CellConcentration_uM), " observations total.", sep="") ) #14 studies, 43 chemicals, 153 observations #number each row IVD_Manuscript_data[,X:=1:length(IVD_Manuscript_data$ChemicalName)] ``` #In vitro assay descriptors (optional step) In vitro assay descriptors for the measurements in Supplemental Materials Table S2 were added as additional assays into httk:invitro.assay.params using the AENM “IVD” + Author + Year (that is, Smith et al. (2000) would be AENM “IVDSmith2000”. ```{r, eval = execute.vignette} ### Create assay_component_endpoint_name columns for assay parameters ### #create assay_name column IVD_Manuscript_data[, assay_name := gsub(", ", "", IVD_Manuscript_data$Citation)] #create the assay_component_endpoint_name column IVD_Manuscript_data[, assay_component_endpoint_name := paste("IVD", gsub(", ", "", assay_name), sep="")] #tack on the "interesting" part for each assay to create the endpoint_name IVD_Manuscript_data[assay_component_endpoint_name=="IVDBellwon2015b" | assay_component_endpoint_name == "IVDBroeders2013" | assay_component_endpoint_name == "IVDWei2010", assay_component_endpoint_name := paste(assay_component_endpoint_name, CellType, sep="_")] #these two have multiple concentrations or time points IVD_Manuscript_data[assay_component_endpoint_name=="IVDMundy2004" | assay_component_endpoint_name=="IVDLundquist2014", assay_component_endpoint_name := paste(paste(assay_component_endpoint_name, paste(ExposureDuration_hrs, "hr", sep = ""), sep="_"), paste(FBSf, "FBSf", sep = ""), sep = "_")] #create a copy with the assay parameters for the options table (Table S7) options.IVD_Manuscript_data<-copy(IVD_Manuscript_data) ### Pull in httk assay parameter table ### invitro.assay.params <- copy(invitro.assay.params) #copy the experimental data frame to pull out the parameter lines parameters_only<-copy(IVD_Manuscript_data) ### Set up assay parameters to match with invitro.assay.params.RData ### #rename columns to align with the names in invitro.assay.params.RData parameters_only[, timepoint_hr := ExposureDuration_hrs] %>% .[, media_serum := SerumType] %>% .[, cell_type := CellType] # get the list of columns we need ivpnames<-names(invitro.assay.params) #add on the extra columns to match invitro.assay.params and parameters_only if(!all(ivpnames%in%names(parameters_only))){ parameters_only[,ivpnames[!(ivpnames %in% names(parameters_only))]] <- as.double(NA)} #filter parameters_only to match up with ivpnames and only the unique rows parameters_only<-parameters_only %>% select(as.character(ivpnames)) %>% unique() # reassign own parameters table to pass to armitage and kramer functions user_assay_parameters<-parameters_only # should have 29 + 1649 = 1678 rows #remove parameter columns since we will be pulling from "user_assay_parameters" ivpnames[2]<-NA #remove endpoint_name bc we need to be able to reference it IVD_Manuscript_data[,ivpnames[(ivpnames %in% names(IVD_Manuscript_data))]]<- NULL ``` # Run the in vitro distribution models ```{r run_IVD_models, eval = execute.vignette} # run Armitage model, output concentrations in umol/L (e.g. uM) armitageOutput_data.dt <- armitage_eval(tcdata=IVD_Manuscript_data, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters, restrict.ion.partitioning = TRUE) # run Kramer model, output concentrations in umol/L (e.g. uM) kramerOutput_data.dt <- kramer_eval(tcdata=IVD_Manuscript_data, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters, restrict.ion.partitioning = TRUE) ``` # Manipulate data and prep for plotting ```{r manipulate_for_plotting, eval = execute.vignette} #delete nomconc column Armitage.dt<-subset(armitageOutput_data.dt, select=-c(nomconc)) #pull ioc types from armitage since kramer doesn't generate them IOC_Type_output<-Armitage.dt[,.(ChemicalName, IOC_Type)] %>% distinct() #delete nomconc column and add in the IOC type column for Kramer Kramer.dt_subset<-subset(kramerOutput_data.dt, select=-c(nomconc)) Kramer.dt<-Kramer.dt_subset[IOC_Type_output, on = "ChemicalName"] #delete cellconc column and add in the IOC type column for Nominal Nominal.dt_subset<-subset(kramerOutput_data.dt, select = -c(concentration_cells)) Nominal.dt<-Nominal.dt_subset[IOC_Type_output, on = "ChemicalName"] #add tags Armitage.dt[,label:= "Armitage"] Kramer.dt[,label:= "Kramer"] Nominal.dt[,label:= "Nominal"] #combine data for plotting (datasets are side by side) combodata<-merge(Armitage.dt, kramerOutput_data.dt, by = "X", all.x = TRUE) #kramer output still has nomconc values if(save_output){ write.csv(combodata, file = paste0(path_out,"ManuscriptOutput_", data.date,"_",Sys.Date(),".csv")) } ``` ```{r ggplot_set_up, eval = execute.vignette} #set default theme elements for visually consistent manuscript plots personal_theme <- function(){ theme_bw() + theme(axis.line = element_blank(), panel.border = element_rect(colour = "black", linewidth=1), axis.title = element_text(size = rel(1.25)), axis.text = element_text(size = rel(1.15)), plot.title = element_text(size = rel(1.5), hjust = 0.5), panel.grid = element_blank(), legend.position = "none") } #set plot colors to correspond with each chemica plotcolors<-scales::viridis_pal()(length( unique(armitageOutput_data.dt$ChemicalName))) names(plotcolors)<-armitageOutput_data.dt$ChemicalName %>% unique() %>% sort() ``` # Main Paper Plots: ## Figure 1: In Vitro Distribution Diagram (no code) ## Figure 2: Predicted and experimental intracellular concentrations from the Nominal = Cfree (left panel), Armitage (middle panel), and Kramer (right panel) models. Dashed line shows unity. ```{r joint_intracel_plot_fig2, eval = execute.vignette} #ARMITAGE: #Armitage RMSLE: arm_rmsle_long<-rmsle((Armitage.dt$CellConcentration_uM),(Armitage.dt$ccells)) arm_rmsle<-round(arm_rmsle_long, 2) #1.12 #Armitage r^2: ml_arm_overall = lm(CellConcentration_uM~ccells, data = Armitage.dt) summary(ml_arm_overall)$r.squared #0.33 #plot set up yrng <- range(Armitage.dt$CellConcentration_uM) xrng <- range(Armitage.dt$ccells) n_chems<-n_distinct(Nominal.dt$ChemicalName) n_data<-nrow(Nominal.dt) # now plot Armitage_ccell<-ggplot(Armitage.dt) + geom_point(aes(ccells, CellConcentration_uM, color = ChemicalName)) + geom_abline(intercept=0,slope=1, linetype = "dashed") + xlab(expression("Armitage Predicted "*italic("C"["cell"])~"(\u03BCM)")) + ylab(expression("Experimental "*italic(" C"["cell"])~"(\u03BCM)")) + scale_x_log10(lim = c(0.00001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + scale_y_log10(lim = c(0.00001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + annotate(geom = "text", x = xrng[2], y = yrng[1], label = parse(text = paste( "N[data] ", "~'='~", n_data)), hjust = 1, vjust = 0, size = 4) + annotate(geom = "text", x = xrng[2], y = yrng[1], label = parse(text = paste( "N[chemicals] ", "~'='~", n_chems)), hjust = 1, vjust = 1.2, size = 4)+ annotate(geom = "text", x = xrng[2], y = yrng[1], label = paste("RMSLE =", arm_rmsle), hjust = 1, vjust = 3, size = 4) + personal_theme() + scale_color_manual(values=plotcolors) #KRAMER: #Kramer RMSLE: kram_rmsle_long<- rmsle((Kramer.dt$CellConcentration_uM),(Kramer.dt$concentration_cells)) kram_rmsle<-format(round(kram_rmsle_long, digits=2), nsmall = 2) #1.30 #Kramer r^2: ml_kram_overall = lm(CellConcentration_uM~concentration_cells, data = Kramer.dt) summary(ml_kram_overall)$r.squared #0.03 #plot Kramer_ccell<-ggplot(Kramer.dt)+ geom_point(aes(concentration_cells, CellConcentration_uM, colour = ChemicalName)) + geom_abline(intercept=0,slope=1, linetype = "dashed") + scale_x_log10(lim = c(0.00001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + scale_y_log10(lim = c(0.00001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + xlab(expression("Kramer Predicted "*italic("C"["cell"])~"(\u03BCM)")) + ylab(expression("Experimental "*italic(" C"["cell"])~"(\u03BCM)")) + annotate(geom = "text", x = xrng[2], y = yrng[1], label = parse(text = paste( "N[data] ", "~'='~", n_data)), hjust = 1, vjust = 0, size = 4) + annotate(geom = "text", x = xrng[2], y = yrng[1], label = parse(text = paste( "N[chemicals] ", "~'='~", n_chems)), hjust = 1, vjust = 1.2, size = 4)+ annotate(geom = "text", x = xrng[2], y = yrng[1], label = paste("RMSLE =", kram_rmsle), hjust = 1, vjust = 3, size = 4) + personal_theme() + scale_color_manual(values=plotcolors) #NOMINAL: #Nominal RMSLE: nom_rmsle_long<-rmsle((Nominal.dt$CellConcentration_uM),(Nominal.dt$nomconc)) nom_rmsle<-round(nom_rmsle_long, 2) #1.45 Nominal_ccell<-ggplot(Nominal.dt) + geom_point(aes(nomconc, CellConcentration_uM, colour = ChemicalName))+ geom_abline(intercept=0,slope=1, linetype = "dashed") + scale_x_log10(lim = c(0.00001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + scale_y_log10(lim = c(0.00001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + xlab(expression(italic("C"["nominal"])~"(\u03BCM)")) + ylab(expression("Experimental "~italic("C"["cell"])~"(\u03BCM)")) + labs(colour = "Chemical")+ annotate(geom = "text", x = xrng[2], y = yrng[1], label = parse(text = paste( "N[data] ", "~'='~", n_data)), hjust = 1, vjust = 0, size = 4) + annotate(geom = "text", x = xrng[2], y = yrng[1], label = parse(text = paste( "N[chemicals] ", "~'='~", n_chems)), hjust = 1, vjust = 1.2, size = 4) + annotate(geom = "text", x = xrng[2], y = yrng[1], label = paste("RMSLE =", nom_rmsle), hjust = 1, vjust = 3, size = 4) + personal_theme() + theme(legend.position = "bottom", legend.key.size = unit(2, 'mm'), legend.spacing = unit(10, "pt"), legend.text = element_text(size = 6))+ guides(col = guide_legend(ncol = 7)) + scale_color_manual(values=plotcolors) #Combine the three plots w RMSLE Ccells_comboplot<-ggarrange(Nominal_ccell, Armitage_ccell, Kramer_ccell, ncol=3 , legend = "bottom", common.legend = TRUE) #save if(save_output){ ggsave(file = paste0("Figure2_",data.date,"_",Sys.Date(),".png"), path = path_out, Ccells_comboplot, width = 10, height = 5, dpi = 300) } print(Ccells_comboplot) ``` ## Figure 3: Predicted and experimental intracellular concentrations from the Armitage (left panel) and Kramer (right panel) models for the ten chemicals with multiple observations. Solid lines connect multiple measurements for a single chemical. The dashed line shows unity. ```{r multiobs_intracel_plot_fig3, eval = execute.vignette} #Armitage.multobs.dt %>% count(ChemicalName, NominalDose_uM, Citation) #some of these have super close nomconcs because they are experimentally #measured - not really relevant for this comparison because they are replicates #chemicals to remove: #Hexachlorobenzene, Malathion, Pentachlorophenol, Propiconazole #(all of the chemicals from Stadnicka, 2014) multiObsChemnameList<-Armitage.dt %>% dplyr::filter(Citation != "Stadnicka, 2014") %>% count(ChemicalName, NominalDose_uM) %>% count(ChemicalName) %>% dplyr::filter(n>=2) %>% select(ChemicalName) %>% distinct() #10 chemicals with multiple nominal doses measured #filter measured data to just these chemicals Armitage.multobs.dt_og <- Armitage.dt %>% dplyr::filter(ChemicalName %in% multiObsChemnameList$ChemicalName) Kramer.multobs.dt_og <- Kramer.dt %>% dplyr::filter(ChemicalName %in% multiObsChemnameList$ChemicalName) Nominal.multobs.dt_og <- Nominal.dt %>% dplyr::filter(ChemicalName %in% multiObsChemnameList$ChemicalName) #each have 89 observations #merge the lines with multiple observations for the same nomconc (ie tox21) Armitage.multobs.dt<-Armitage.multobs.dt_og %>% group_by(ChemicalName, NominalDose_uM) %>% mutate(mean_measuredccell = mean(CellConcentration_uM), mean_predictedccell = mean(ccells)) %>% select(ChemicalName, NominalDose_uM, mean_measuredccell, mean_predictedccell) %>% distinct() Kramer.multobs.dt<-Kramer.multobs.dt_og %>% group_by(ChemicalName, NominalDose_uM) %>% mutate(mean_measuredccell = mean(CellConcentration_uM), mean_predictedccell = mean(concentration_cells)) %>% select(ChemicalName, NominalDose_uM, mean_measuredccell, mean_predictedccell) %>% distinct() Nominal.multobs.dt<-Nominal.multobs.dt_og %>% group_by(ChemicalName, NominalDose_uM) %>% mutate(mean_measuredccell = mean(CellConcentration_uM), mean_predictedccell = mean(NominalDose_uM)) %>% select(ChemicalName, NominalDose_uM, mean_measuredccell, mean_predictedccell) %>% distinct() #down to 50 observations because the duplicates have been averaged ### Plotting ### #plot set up yrng_mult <- range(Armitage.multobs.dt$mean_measuredccell) xrng_mult <- range(Armitage.multobs.dt$mean_predictedccell) n_chems_mult<-n_distinct(Armitage.multobs.dt$ChemicalName) n_data_mult<-nrow(Armitage.multobs.dt) #ARMITAGE #Armitage RMSLE: arm_mult_rmsle_long<-rmsle((Armitage.multobs.dt$mean_measuredccell), (Armitage.multobs.dt$mean_predictedccell)) arm_mult_rmsle<-round(arm_mult_rmsle_long, 2) #1.27 #r squared ml_arm <- lm(mean_measuredccell~mean_predictedccell, data = Armitage.multobs.dt) rsq_armitage_mult <- summary(ml_arm)$r.squared #0.87 # now plot Armitage_ccell_mult<-ggplot(Armitage.multobs.dt) + geom_point(aes(mean_predictedccell, mean_measuredccell, colour = ChemicalName)) + geom_smooth(aes(mean_predictedccell, mean_measuredccell, colour = ChemicalName), method=lm, se = FALSE) + geom_abline(intercept=0,slope=1, linetype = "dashed") + scale_x_log10(lim = c(0.0001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + scale_y_log10(lim = c(0.0001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + xlab(expression("Armitage Predicted "*italic("C"["cell"])~"(\u03BCM)")) + ylab(expression("Experimental "*italic(" C"["cell"])~"(\u03BCM)")) + labs(colour = "Chemical")+ annotate(geom = "text", x = xrng_mult[2], y = yrng_mult[1], label = parse(text = paste("N[data] ", "~'='~", n_data_mult)), hjust = 1, vjust = 0, size = 4) + annotate(geom = "text", x = xrng_mult[2], y = yrng_mult[1], label = parse(text = paste("N[chemicals] ", "~'='~", n_chems_mult)), hjust = 1, vjust = 1.2, size = 4)+ annotate(geom = "text", x = xrng_mult[2], y = yrng_mult[1], label = paste("RMSLE =", arm_mult_rmsle), hjust = 1, vjust = 3, size = 4) + personal_theme() + scale_color_manual(values=plotcolors) #KRAMER #Kramer RMSLE: kram_mult_rmsle_long<-rmsle((Kramer.multobs.dt$mean_measuredccell), (Kramer.multobs.dt$mean_predictedccell)) kram_mult_rmsle<-round(kram_mult_rmsle_long, 2) # 1.58 #r squared ml_kram <- lm(mean_measuredccell~mean_predictedccell, data = Kramer.multobs.dt) rsq_kramer_mult <- summary(ml_kram)$r.squared #0.87 # now plot Kramer_ccell_mult<-ggplot(Kramer.multobs.dt) + geom_point(aes(mean_predictedccell, mean_measuredccell, colour = ChemicalName)) + geom_smooth(aes(mean_predictedccell, mean_measuredccell, colour = ChemicalName), method=lm, se = FALSE) + geom_abline(intercept=0,slope=1, linetype = "dashed") + scale_x_log10(lim = c(0.0001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + scale_y_log10(lim = c(0.0001, 400000), breaks = c(10^-4, 10^0, 10^4), labels = label_log(digits = 2)) + xlab(expression("Kramer Predicted "*italic("C"["cell"])~"(\u03BCM)")) + ylab(expression("Experimental "*italic(" C"["cell"])~"(\u03BCM)")) + labs(colour = "Chemical")+ annotate(geom = "text", x = xrng_mult[2], y = yrng_mult[1], label = parse(text = paste("N[data] ", "~'='~", n_data_mult)), hjust = 1, vjust = 0, size = 4) + annotate(geom = "text", x = xrng_mult[2], y = yrng_mult[1], label = parse(text = paste("N[chemicals] ", "~'='~", n_chems_mult)), hjust = 1, vjust = 1.2, size = 4) + annotate(geom = "text", x = xrng_mult[2], y = yrng_mult[1], label = paste("RMSLE =", kram_mult_rmsle), hjust = 1, vjust = 3, size = 4) + personal_theme() + scale_color_manual(values=plotcolors) + theme(legend.key.size = unit(2, 'mm'), legend.text = element_text(size = 6)) #arrange Ccells_comboplot_mult<-ggarrange(Armitage_ccell_mult, Kramer_ccell_mult, ncol=2, legend = "bottom", common.legend = TRUE) if(save_output){ ggsave(file = paste0("Figure3_",data.date,"_",Sys.Date(),".png"), path = path_out, Ccells_comboplot_mult, width = 8, height = 4, dpi = 300) } print(Ccells_comboplot_mult) ``` ## Figure 4: Chemical partitioning into water, air, cell, and plastic compartments for the Armitage and Kramer models. Dashed line shows unity. ```{r chemicalPart_intracel_plot_fig4, eval = execute.vignette} #Cwat_uM -------- Cwat_R Cwater_comparison<-ggplot(combodata) + geom_point(aes(concentration_medium, cwat_s, colour = ChemicalName.x)) + geom_abline(intercept=0,slope=1, linetype = "dashed") + ggtitle("Medium Concentration") + xlab(expression("Kramer Predicted "*italic("C"["medium"])~"(\u03BCM)")) + ylab(expression("Armitage Predicted "*italic(" C"["medium"])~"(\u03BCM)")) + scale_x_log10(labels = label_log(digits = 2)) + scale_y_log10(labels = label_log(digits = 2)) + personal_theme() + scale_color_manual(values=plotcolors) #Cair_uM -------- Cair_R Cair_comparison<-ggplot(combodata) + geom_point(aes(concentration_air, cair, colour = ChemicalName.x)) + geom_abline(intercept=0,slope=1, linetype = "dashed") + ggtitle("Air Concentration") + xlab(expression("Kramer Predicted "*italic("C"["air"])~"(\u03BCM)")) + ylab(expression("Armitage Predicted "*italic(" C"["air"])~"(\u03BCM)")) + scale_x_log10(labels = label_log(digits = 2)) + scale_y_log10(labels = label_log(digits = 2)) + personal_theme() + scale_color_manual(values=plotcolors) #C_cells_uM ----- Ccells Ccells_comparison<-ggplot(combodata) + geom_point(aes(concentration_cells, ccells, colour = ChemicalName.x)) + geom_abline(intercept=0,slope=1, linetype = "dashed") + ggtitle("Cell Concentration") + xlab(expression("Kramer Predicted "*italic("C"["cell"])~"(\u03BCM)")) + ylab(expression("Armitage Predicted "*italic(" C"["cell"])~"(\u03BCM)")) + scale_x_log10(lim = c(10^-4, 10^5), labels = label_log(digits = 2), breaks = c(10^-2, 10^0, 10^2, 10^4)) + scale_y_log10(lim = c(10^-4, 10^5), labels = label_log(digits = 2), breaks = c(10^-2, 10^0, 10^2, 10^4)) + personal_theme() + scale_color_manual(values=plotcolors) #Aplastic_uM_m2 - (Cplastic_R / Sarea_R) Aplastic_comparison<-ggplot(combodata) + geom_point(aes(concentration_plastic, cplastic, colour = ChemicalName.x)) + geom_abline(intercept=0,slope=1, linetype = "dashed") + ggtitle("Plastic Concentration") + xlab(expression("Kramer Predicted "*italic("C"["plastic"])~"(\u03BCM)")) + ylab(expression("Armitage Predicted "*italic(" C"["plastic"])~"(\u03BCM)")) + labs(colour = "Chemical")+ scale_x_log10(labels = label_log(digits = 2), breaks = c(10^-6, 10^-4, 10^-2, 10^0)) + scale_y_log10(labels = label_log(digits = 2)) + personal_theme() + scale_color_manual(values=plotcolors) + theme(legend.key.size = unit(1, 'mm'), legend.spacing = unit(2, "pt"), legend.text = element_text(size = 6)) compartment_plots<-ggarrange(Ccells_comparison, Aplastic_comparison, Cwater_comparison, Cair_comparison, common.legend = TRUE, legend = "none") if(save_output){ ggsave(file = paste0("Figure4_",data.date,"_",Sys.Date(),".png"), path = path_out, compartment_plots, width = 8, height = 6, dpi= 300) } print(compartment_plots) ``` ## Figure 5: Comparison of Armitage model using curated vs dashboard physchem values. ```{r curated_values_plot_fig5, eval = execute.vignette} #load curated data info original_curated.data.dt <- httk::Dimitrijevic.IVD #first - run using the default/dashboard values default.dt<-copy(original_curated.data.dt %>% select(-c( "Arnot_pka", "Arnot_pkb", "Chemaxon_pKa", "Chemaxon_pKb", "log.KOW.N", "log.KAW.N", "Predicted_Ccell_µM", "FoA"))) # run the model, output concentrations in umol/L (e.g. uM) armitageOutput_default.dt <- armitage_eval(tcdata=default.dt, restrict.ion.partitioning = TRUE, surface.area.switch = FALSE) #next- run using curated values (EAS-E Suite) curated.data.dt<-copy(original_curated.data.dt) #overwrite using curated physchem properties curated.data.dt[, pKa_Donor := as.character(Arnot_pka)] %>% #acidic .[, pKa_Accept := as.character(Arnot_pkb)] %>% #basic .[, gkow := log.KOW.N] %>% .[, gkaw_n := log.KAW.N] # run the model, output concentrations in umol/L (e.g. uM) armitageOutput_curated.data.dt <- armitage_eval(tcdata=curated.data.dt, restrict.ion.partitioning = TRUE, surface.area.switch = FALSE) #save output with these values (Supplemental Materials T9) defaultvals<- armitageOutput_default.dt %>% select(Name, casrn, gkow_n, gkaw_n, pKa_Donor, pKa_Accept) curatedvals<-armitageOutput_curated.data.dt %>% select(Name, casrn, gkow_n, gkaw_n, pKa_Donor, pKa_Accept) #tie the two tables together for the supplement s9_table<-rbind(curatedvals, defaultvals) #calculate rmsle #default values #individual armitageOutput_default.dt[,rmsle_ccell_default:= rmsle(Reported_Ccell_µM, ccells), by = Name] #total RMSLE_default.dt<-rmsle(armitageOutput_default.dt$Reported_Ccell_µM, armitageOutput_default.dt$ccells) RMSLE_default.dt<-round(RMSLE_default.dt, 3) #0.586 #curated values #individual armitageOutput_curated.data.dt[,rmsle_ccell_curated:= rmsle(Reported_Ccell_µM, ccells), by = Name] #total RMSLE_curated.data.dt<-rmsle(armitageOutput_curated.data.dt$Reported_Ccell_µM, armitageOutput_curated.data.dt$ccells) RMSLE_curated.data.dt<-round(RMSLE_curated.data.dt, 3) #0.570 #improvement when using curated data: RMSLE_default.dt-RMSLE_curated.data.dt #0.016 ## difference between the two on a per-chemical basis ## curated_data.dt<-merge(armitageOutput_default.dt, armitageOutput_curated.data.dt, by = "Name") #calculate rmsle difference for each chemical curated_data.dt[,rmsle_difference:= rmsle_ccell_curated-rmsle_ccell_default, by = Name] #which chemicals had the greatest magnitude of difference #largest improvement when using curated data head(curated_data.dt %>% select(rmsle_difference, Name) %>% arrange(rmsle_difference), 2) #largest improvement when using default data tail(curated_data.dt %>% select(rmsle_difference, Name) %>% arrange(rmsle_difference), 2) #plot curated_difference_httk_plot<-ggplot(curated_data.dt) + geom_point(aes(ccells.y, Reported_Ccell_µM.y, colour = Name), size = 4) + geom_label_repel(aes(ccells.y, Reported_Ccell_µM.y, label = round(rmsle_difference, 2))) + geom_abline(intercept=0,slope=1, linetype = "dashed") + xlab(expression("Armitage Predicted "*italic("C"["cell"])~"(\u03BCM)")) + ylab(expression("Experimental "*italic(" C"["cell"])~"(\u03BCM)")) + labs(color = "Chemical")+ scale_x_log10(lim = c(10, 10000), labels = label_log(digits = 2)) + scale_y_log10(lim = c(10, 10000), labels = label_log(digits = 2)) + personal_theme() + scale_color_viridis(discrete=TRUE)+ theme(legend.position = "bottom", legend.key.size = unit(2, 'mm'), legend.spacing = unit(10, "pt"), legend.text = element_text(size = 10)) if(save_output){ ggsave(file = paste0("Figure5_",data.date,"_",Sys.Date(),".png"), path = path_out, curated_difference_httk_plot, width = 8, height = 5, dpi = 300) write.csv(s9_table, file = paste0(path_out,"table_s9_",data.date,"_", Sys.Date(),".csv")) } ``` # Main paper Calculations: ## Over/underestimation ```{r fold_change_calculation, eval = execute.vignette} # The Armitage model resulted in a slight over prediction of Ccell whereas # the Kramer model had the greatest spread but most evenly over and # underpredicted Ccell (Figure 2). range(Armitage.dt$ccells) #1.555754e-02 4.845393e+04 range(Kramer.dt$concentration_cells) #5.838149e-04 2.663947e+04 #Kramer has a larger spread of predictions ##create datatable with both outputs stacked stacked.data<-rbind(Armitage.dt, Kramer.dt, fill=TRUE) #overpredict or underpredict for each model compared to experimental stacked.data_HL<-copy(stacked.data) stacked.data_HL[ccells>CellConcentration_uM, ARMITAGE_ccellcomp:="ARMITAGE higher"] %>% .[ccells% .[concentration_cells>CellConcentration_uM, KRAMER_ccellcomp:="KRAMER higher"] %>% .[concentration_cells% count(ARMITAGE_ccellcomp, KRAMER_ccellcomp) #Armitage model more often over predicts #Kramer is pretty even split of over and under predictions #pick out where armitage prediction or kramer prediction is larger sidebyside.data_HL<-copy(combodata) sidebyside.data_HL[ccells>concentration_cells, ARMITAGE:="ARMITAGE higher"] %>% .[ccells% select(ChemicalName.x, ARMITAGE, KRAMER) %>% distinct() %>% count(ARMITAGE, KRAMER) #armitage: higher for 38 chemicals #kramer: higher for 5 chemicals #38+5 = 43 chemicals overall ``` ## Effect of experimentally measured nominal dose Change in accuracy of intracellular concentration predicted by studies reporting experimentally measured nominal dose ```{r experimental_nomconc, eval = execute.vignette} #pull observations that did experimentally measure the nominal dose exp_nom.dt<-copy(IVD_Manuscript_data[!(is.na(ExperimentalDose_uM)),]) #6 citations did measure experimental dose #42 observations #run armitage with nomconc as NominalDose_uM.x (not experimentally measured) arm_nomconc<-copy(exp_nom.dt[, nomconc := NominalDose_uM]) # run Armitage model, output concentrations in umol/L (e.g. uM) armitageOutput_arm_nomconc <- armitage_eval(tcdata=arm_nomconc, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters, restrict.ion.partitioning = TRUE) #run armitage with nomconc as ExperimentalDose_uM.x (experimentally measured) arm_expconc<-copy(exp_nom.dt[, nomconc := ExperimentalDose_uM]) armitageOutput_arm_expconc <- armitage_eval(tcdata=arm_expconc, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters, restrict.ion.partitioning = TRUE) #Armitage NOMCONC RMSLE: arm_nomconc_rmsle_long<-rmsle((armitageOutput_arm_nomconc$CellConcentration_uM), (armitageOutput_arm_nomconc$ccells)) arm_nomconc_rmsle<-round(arm_nomconc_rmsle_long, 2) #1.62 #Armitage EXPCONC RMSLE: arm_expconc_rmsle_long<-rmsle((armitageOutput_arm_expconc$CellConcentration_uM), (armitageOutput_arm_expconc$ccells)) arm_expconc_rmsle<-round(arm_expconc_rmsle_long, 2) #1.53 #improvement between the experimental and nomconc rmsles arm_nomconc_rmsle- arm_expconc_rmsle #0.09 ``` #Supplement: ## Table S1. Physicochemical properties of test chemicals and Ccell/Cnominal ratios from each reference. ```{r physchem_data_tableS1, eval = execute.vignette} #create datatable to hold the info table1.dt<-copy(Nominal.dt) # pull out the columns we need data.dt<-table1.dt[, c("Citation", "casrn", "ChemicalName", "IOC_Type", "nomconc", "CellConcentration_uM"), with = FALSE] # pull dtxsids, log P, henry's law, and pka: #dtxsids cheminfo.dt<-as.data.table(get_chem_id(chem.cas=unique(data.dt$casrn))) cheminfo.dt[, casrn:=chem.cas] %>% #rename .[,chem.cas:=NULL] %>% #delete .[,chem.name:=NULL] #delete #add dtxsid info to the storage file data.dt2<-data.dt[cheminfo.dt,on="casrn"] #collect log P, henry's law, and pka values data.dt2[, c("logP","logHenry", "pKa_Donor", "pKa_Accept") := as.data.frame(get_physchem_param(param = c("logP","logHenry", "pKa_Donor", "pKa_Accept"), chem.cas = casrn))] # calculate median CellConcentration_uM:cnom ratio # (for each citation if multiple) and number of observations per chemical #get ccell : cnom ratio data.dt2[,cellconc_cnom_ratio:=CellConcentration_uM/nomconc] #for each chemical/citation combination, calculate the median ccell:cnom ratio data.dt2[, median_ratio := median(cellconc_cnom_ratio), by=list(Citation, ChemicalName)] #count the number of observations by citation and chemname data.dt2[, number_of_obs:=.N, by=list(Citation, ChemicalName)] #filter down to one observation per citation/casrn combo final_output<-data.dt2 %>% mutate(logP=round(logP, 2), logHenry=round(logHenry, 2), median_ratio=round(median_ratio, 3)) %>% select(-c("casrn", "nomconc", "CellConcentration_uM", "cellconc_cnom_ratio")) %>% #get rid of extra rows distinct(Citation, ChemicalName, .keep_all = TRUE) #1 obs per citation/chem pair #save the output if(save_output){ write.csv(final_output, file = paste0(path_out,"table_s1_",data.date,"_", Sys.Date(),".csv")) } ``` ## Table S2. In vitro assay descriptors for the measurements ```{r assay_descriptors_tableS2, eval = execute.vignette} #copy table to manipulate parameter_table<-copy(Nominal.dt) table_s2<-parameter_table[, c("Citation", "ChemicalName", "nomconc", "NominalDose_uM", "v_total", "v_working", "sarea", "well_number", "cell_yield", "FBSf", "FigureNumber", "CellType")] #subset columns #save the output if(save_output){ write.csv(table_s2, file = paste0(path_out,"table_s2_raw_",data.date,"_", Sys.Date(),".csv")) } ``` ## Table S3. Root mean squared log error (RMSLE) and coefficient of determination (r2) for the chemicals with multiple observations. Note: Atrazine, Flusilazole, N-Phenyl-1,4-benzenediamine, Rosiglitazone, Thiacloprid, and Triphenyl phosphate each only reflect two data points. ```{r multobs_RMSLE_tableS3, eval = execute.vignette} #Armitage #copy for editing multobs_rmsle_armitage<-copy(as.data.table(Armitage.multobs.dt)) #group by chemname, calc RMSLE, r^2, and number of obs multobs_rmsle_armitage[, RMSLE_armitage_mult:= rmsle(mean_measuredccell, mean_predictedccell), by = "ChemicalName"] %>% #rmsle .[, rsq_armitage_mult:= summary(lm(mean_measuredccell~mean_predictedccell))$r.squared, by = "ChemicalName"] %>% #r^2 .[, numberofpoints:= .N, by = "ChemicalName"] #numberofpoints = number of datapoints considered (includes averaged points) #repeat for kramer #copy for editing multobs_rmsle_kramer<-copy(as.data.table(Kramer.multobs.dt)) #group by chemname, calc RMSLE, r^2, and number of obs multobs_rmsle_kramer[, RMSLE_kramer_mult:= rmsle(mean_measuredccell, mean_predictedccell), by = "ChemicalName"] %>% #rmsle .[, rsq_kramer_mult:= summary(lm(mean_measuredccell~mean_predictedccell))$r.squared, by = "ChemicalName"] %>% #r^2 .[, numberofpoints:= .N, by = "ChemicalName"] #cut down both to the columns we need and only 1 obs per chemical mult.dt<-merge(multobs_rmsle_armitage[, -c("NominalDose_uM", "mean_measuredccell", "mean_predictedccell")] %>% distinct() , multobs_rmsle_kramer[, -c("NominalDose_uM", "mean_measuredccell", "mean_predictedccell")] %>% distinct()) #for higher/lower rmsle comparison mult_HL<-copy(mult.dt[RMSLE_kramer_mult>RMSLE_armitage_mult, higherorlower:="Kramer higher"] %>% .[RMSLE_kramer_mult% count(higherorlower) #armitage predictions have lower error for 6 chemicals #kramer predictions are lower error for the other 4 #save the output if(save_output){ write.csv(mult.dt, file = paste0(path_out,"table_s3_",data.date,"_",Sys.Date(),".csv")) } ``` ## Table S4: RMSLE and r^2 for acids and bases. ```{r ionization_RMSLE_tableS4, eval = execute.vignette} #copy for editing ionization_rmsle<-copy(stacked.data) #calculate RMSLSE and r^2 for each model and ionization type ionization_rmsle_output<- copy(ionization_rmsle[label=="Armitage", RMSLE_armitage_ionization:= rmsle(CellConcentration_uM, ccells), by = "IOC_Type"] %>% #RMSLE .[label=="Armitage", rsq_armitage_ionization:= summary(lm(CellConcentration_uM~ccells))$r.squared, by = "IOC_Type"] %>% #r^2 .[label=="Kramer", RMSLE_kramer_ionization:= rmsle(CellConcentration_uM, concentration_cells), by = "IOC_Type"] %>% #RMSLE .[label=="Kramer", rsq_kramer_ionization:= summary(lm(CellConcentration_uM~concentration_cells))$r.squared, by = "IOC_Type"] %>% #r^2 .[,numberofpoints:= .N, by = "IOC_Type"] %>% .[, c("IOC_Type", "RMSLE_armitage_ionization", "rsq_armitage_ionization", "RMSLE_kramer_ionization", "rsq_kramer_ionization", "numberofpoints")] %>% #select columns we want distinct()) print(ionization_rmsle_output) #number of points is overall, not for each calculation #save the output if(save_output){ write.csv(ionization_rmsle_output, file = paste0(path_out,"table_s4_",data.date,"_",Sys.Date(),".csv")) } ``` ## Table S5. RMSLE across lipophilicity. ```{r logP_RMSLE_tableS5, eval = execute.vignette} #cut up into chunks gkowbins<-stacked.data %>% mutate(LogP_bins = cut(gkow, breaks = c(-Inf, 1.30, 3.36, Inf), labels = c("Hydrophilic", "Moderate Lipophilic", "Lipophilic"))) #calculate rmsle for each model and logP bin combo gkowbins_rmsle_output<-suppressWarnings(copy( gkowbins[label == "Armitage", Armitage_RMSLE := rmsle( CellConcentration_uM, ccells), by = LogP_bins] %>% .[label == "Kramer", Kramer_RMSLE := rmsle( CellConcentration_uM, concentration_cells), by = LogP_bins] %>% .[, numberofpoints := .N, by = list(LogP_bins, label)] %>% .[, c("LogP_bins", "Armitage_RMSLE", "Kramer_RMSLE", "numberofpoints", "label")] %>% distinct())) print(gkowbins_rmsle_output) #number of points is overall, not for each calculation #save the output if(save_output){ write.csv(gkowbins_rmsle_output, file = paste0(path_out,"table_s5_",data.date,"_",Sys.Date(),".csv")) } ``` ## Table S6. RMSLE across volatility. ```{r kaw_RMSLE_tableS6, eval = execute.vignette} #source for chunks is table 2 Volatility class in water: #https://www.epa.gov/pesticide-science-and-assessing-pesticide-risks/ #guidance-reporting-environmental-fate-and-transport#:~:text=is%20very%20low.-, #Volatility%20from%20Water,the%20reciprocal%20of%20KAW.&text=GMW%20=%20gram%20 #molecular%20weight%20in%20g/mole.&text=This%20value%20is%20the%20inverse,as%20 #Cwater/Cair.&text=Note%20that%20all%20chemicals%20may,volatility%20potential%20 #is%20very%20low. #cut up kaw into relevant bins kawbins<-stacked.data %>% mutate(Kaw_bins = cut(DR_kaw, breaks = c(Inf, 10^-2, 10^-3, 10^-5, -Inf), labels = c("Rapidly lost from a water surface", "Volatile from a water surface", "Slightly volatile from a water surface", "Non-volatile"))) #calculate rmsle for each model and logP bin combo kaw_rmsle_output<-suppressWarnings(kawbins[label == "Armitage", Armitage_RMSLE := rmsle( CellConcentration_uM, ccells), by = Kaw_bins] %>% .[label == "Kramer", Kramer_RMSLE := rmsle( CellConcentration_uM, concentration_cells), by = Kaw_bins] %>% .[, numberofpoints := .N, by = list(Kaw_bins, label)] %>% .[, c("Kaw_bins", "Armitage_RMSLE", "Kramer_RMSLE", "numberofpoints", "label")] %>% distinct()) #save the output if(save_output){ write.csv(kaw_rmsle_output, file = paste0(path_out,"table_s6_",data.date,"_",Sys.Date(),".csv")) } ``` ## Table S7. Effect of "options" on rmsle in the Armitage model. ```{r options_tableS7, eval = execute.vignette} #list of options that can be changed in the armitage model: # - this.option.kbsa2 Use alternative bovine-serum-albumin partitioning # - this.option.swat2 Use alternative water solubility correction # - this.option.kpl2 Use alternative plastic-water partitioning model # - option.plastic == TRUE (default) gives nonzero surface area # - option.bottom == TRUE (default) includes surface area of the bottom of # the well in determining sarea # - restrict.ion.partitioning # § False (default): Treat entire chemical as neutral and allow it all # to partition normally as such # § True: Restrict so only the neutral fraction of the chemical is # available to partition normally but the ionized portion is scaled ##filter to just the ones we can run with the surface area calculation since #option.plastic and option.bottom require that function: options.IVD_Manuscript_data_2<- options.IVD_Manuscript_data[!is.na(well_number), ] %>% .[,sarea:=NULL] #delete sarea so the surface area function will run #126 out of the original 153 observations # length(options.IVD_Manuscript_data$Citation %>% unique()) #14 citations # length(options.IVD_Manuscript_data_2$Citation %>% unique()) #12 citations # 153-126 data points; 27 data points total removed # create table to append the results to options.table <- NULL # run basic armitage, only setting the surface area switch option.basic <- armitage_eval(tcdata=options.IVD_Manuscript_data_2, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters) option.basic_RMSLE<-rmsle((option.basic$CellConcentration_uM), (option.basic$ccells)) #1.148 #append to table options.table<-rbind(options.table, c(round(option.basic_RMSLE ,3), "All Default")) # run with this.option.kbsa2 == TRUE (default is false) option.kbsa <- armitage_eval(tcdata=options.IVD_Manuscript_data_2, this.option.kbsa2 = TRUE, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters) option.kbsa_RMSLE<-rmsle((option.kbsa$CellConcentration_uM), (option.kbsa$ccells)) #1.150 #append to table options.table<-rbind(options.table, c(round(option.kbsa_RMSLE, 3), "option.kbsa=TRUE")) # run with this.option.swat2 == TRUE (default is false) option.swat <- armitage_eval(tcdata=options.IVD_Manuscript_data_2, this.option.swat2 = TRUE, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters) option.swat_RMSLE<-rmsle((option.swat$CellConcentration_uM), (option.swat$ccells)) #3.585 #append to table options.table<-rbind(options.table, c(round(option.swat_RMSLE, 3), "option.swat2=TRUE")) # run with this.option.kpl2 == TRUE (Fischer) (default is false/kramer) option.kpl <- armitage_eval(tcdata=options.IVD_Manuscript_data_2, this.option.kpl2 = TRUE, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters) option.kpl_RMSLE<-rmsle((option.kpl$CellConcentration_uM),(option.kpl$ccells)) #1.101 #append to table options.table<-rbind(options.table, c(round(option.kpl_RMSLE, 3), "option.kpl2=TRUE")) # run with option.plastic == FALSE (default is true) opt.plast_input<- copy(options.IVD_Manuscript_data_2) opt.plast_input[,option.plastic := FALSE] option.plast <- armitage_eval(tcdata=opt.plast_input, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters) option.plast_RMSLE<-rmsle((option.plast$CellConcentration_uM), (option.plast$ccells)) #1.178 #append to table options.table<-rbind(options.table, c(round(option.plast_RMSLE, 3), "option.plastic=FALSE")) # run with option.bottom == FALSE (default is true) option.bottom_input<- copy(options.IVD_Manuscript_data_2) option.bottom_input[,option.bottom := FALSE] option.bottom <- armitage_eval(tcdata=option.bottom_input, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters) option.bottom_RMSLE<-rmsle((option.bottom$CellConcentration_uM), (option.bottom$ccells)) #3.956 #append to table options.table<-rbind(options.table, c(round(option.bottom_RMSLE,3), "option.bottom=FALSE")) # run with restrict.ion.partitioning == TRUE (default is FALSE) option.restion <- armitage_eval(tcdata=options.IVD_Manuscript_data_2, restrict.ion.partitioning = TRUE, surface.area.switch = TRUE, user_assay_parameters = user_assay_parameters) option.restion_RMSLE<-rmsle((option.restion$CellConcentration_uM), (option.restion$ccells)) #1.152 #append to table options.table<-rbind(options.table, c(round(option.restion_RMSLE,3), "restrict.ion.partitioning = TRUE")) #save the output if(save_output){ write.csv(options.table, file = paste0(path_out,"table_s7_",data.date,"_",Sys.Date(),".csv")) } ``` ## Figure S1. 4 Histogram showing chemical-specific RMSLE for Armitage Ccell vs Measured Ccell and Kramer Ccell vs Measured Ccell: ```{r intracellular_histogram_plotS1, eval = execute.vignette} #calculate RMSLE for each row of data combodata[, RMSLE_armitage_by_chem:= rmsle(CellConcentration_uM.x, ccells), by = "ChemicalName.x"] combodata[, RMSLE_kramer_by_chem:= rmsle(CellConcentration_uM.x, concentration_cells), by = "ChemicalName.x"] combodata[, RMSLE_nominal_by_chem:= rmsle(CellConcentration_uM.x, nomconc), by = "ChemicalName.x"] #cut down to one observation per chemical histo.data<-copy(combodata %>% select(ChemicalName.x, RMSLE_armitage_by_chem, RMSLE_kramer_by_chem, RMSLE_nominal_by_chem) %>% unique()) #plot histograms nominal_histogram_by_chem <-ggplot(histo.data, aes(x=RMSLE_nominal_by_chem)) + geom_histogram(binwidth = 0.5, boundary = 0) + labs(x = "Nominal RMSLE", y = "Number of Observations") + xlim(0, 5) + ylim(0, 17) + theme_classic() + theme(axis.text=element_text(size=12), axis.title=element_text(size=14)) armitage_histogram_by_chem <-ggplot(histo.data, aes(x=RMSLE_armitage_by_chem)) + geom_histogram(binwidth = 0.5, boundary = 0) + labs(x = "Armitage RMSLE", y = "Number of Observations") + xlim(0, 5) + ylim(0, 17) + theme_classic() + theme(axis.text=element_text(size=12), axis.title=element_text(size=14)) kramer_histogram_by_chem <- ggplot(histo.data, aes(x=RMSLE_kramer_by_chem)) + geom_histogram(binwidth = 0.5, boundary = 0) + labs(x = "Kramer RMSLE", y = "Number of Observations") + xlim(0, 5) + ylim(0, 17) + theme_classic()+ theme(axis.text=element_text(size=12), axis.title=element_text(size=14)) histogram_comboplot_by_chem<-ggarrange(nominal_histogram_by_chem + theme(axis.ticks.y = element_blank(), plot.margin = margin(r = 1)), armitage_histogram_by_chem + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank(), plot.margin = margin(r = 1, l = 1)), kramer_histogram_by_chem + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank(), plot.margin = margin(r = 1, l = 1)), ncol=3) if(save_output){ ggsave(file = paste0("Figure_S1_by_chem_",data.date,"_",Sys.Date(),".png"), path = path_out, histogram_comboplot_by_chem, width = 10, height = 5, dpi = 300) } print(histogram_comboplot_by_chem) ```