--- title: "Meade (submitted): High Throughput Dermal Exposure Model" author: "Annabel Meade, Celia Schacht, Alex George" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{g) Meade (Submitted): High Throughput Dermal Exposure Model} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- *Please send questions to wambaugh.john@epa.gov or aemeade7@gmail.com* from "Development and analysis of high throughput physiologically based pharmacokinetic/toxicokinetic (PBPK/TK) dermal exposure model" Annabel E. Meade, Celia M.Schacht, Marina V Evans, Alex George, Rachael Cogbill, Risa R. Sayre, and John F. Wambaugh ## Abstract Dermal absorption of chemicals represents an important route of exposure in pharmaceutical, occupational, and environmental settings. There are thousands of chemicals in use with little or no toxicity or toxicokinetic data, and dermal absorption data are lacking, unsurprisingly, for many of these chemicals. One alternative is to estimate human toxicokinetics using high-throughput methods. Accordingly, the aim of this study was to develop a generalized physiologically based pharmacokinetic/toxicokinetic (PBPK/TK) dermal exposure model for the R package httk. For dermal exposures this model can be used in a high-throughput manner to estimate human blood and tissue concentrations and estimate risk for many chemicals. The structure of the dermal PBPK/TK model was based on Campbell, Clewell, Gentry, Anderson, and Clewell, Computational Toxicology, 929, 439 (2012). Chemical-specific metabolism and protein binding data were obtained from the literature as collected by R package httk. The physiochemical property data needed for the model were obtained from the EPA CompTox Chemicals Dashboard. Here we have compared two different in vitro methods for estimating chemical-specific and vehicle-specific permeability in the skin: 1) Potts-Guy [Potts and Guy, Pharm Res, 9, 663 (1992)] and 2) Surrey [Chen, Han, Saib, and Lian, Pharm Res, 32, 1779 (2015)]. The model was constructed to allow comparison of dermal, oral, and intravenous exposures. Over 26 exposure scenarios across 13 chemicals were modeled and compared to published concentration-time in vivo data from the EPA CvT database [Sayre, Wambaugh, and Grulke, Scientific Data, 7, 122 (2020)]. Of these 13 chemicals, four are pharmaceuticals while the other 9 are occupationally or environmentally relevant, 9 of the 13 are lipophilic (log of the octanol to water partition coefficient, logP > 2), three of the 13 have low water solubility (< 10 mg/L), four of the 13 have high water solubility (> 1,000 mg/L), and eight of the 13 are considered volatile. The Root Mean Squared Log10 Error (RMSLE) between log-transformed simulated and observed concentrations was calculated to determine how well the model captures the behavior of this data when using the methods for dermal permeability. Based on these results, the Potts-Guy methods slightly out-perform the Surrey method of calculating dermal permeability for more volatile chemicals and the vehicle is a large factor in method success, where Potts-Guy/octanol and Surrey/water combinations generated the best fits. We also found that exhalation is an important route of excretion for volatiles, in particular. The views expressed in this abstract are those of the authors and do not necessarily represent the views or the policies of the U.S. Environmental Protection Agency. ## HTTK Version This vignette was created with **httk** v2.3.3. Although we attempt to maintain backward compatibility, if you encounter issues with the latest release of **httk** and cannot easily address the changes, historical versions of httk are available from: https://cran.r-project.org/src/contrib/Archive/httk/ ## Table of Contents (added by AG 01/02/2024) Prepare for session Result 1: Model Verification Usint *In Vivo* Data Step One: Load Data Display which chemicals are being looked at Step Two: Clean and Format Data ## Prepare for session ```{r configure_knitr, eval = TRUE, echo=FALSE} knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, comment = '#>') knitr::opts_chunk$set(echo = TRUE,out.width=700, fig.align='center') #figure out.width=700px options(rmarkdown.html_vignette.check_title = FALSE) # Clear memory rm(list=ls()) # Useful function `%!in%`<- Negate(`%in%`) ``` ### eval = FALSE If you are using the RMarkdown version of this vignette (has the file extension .RMD) you will be able to see that several chunks of code in this vignette have the statement `eval = FALSE`. The next chunk of code, by default, sets `run.simulations = FALSE`. This means that the code is included (and necessary) but was not run when the vignette was built. We do this because some steps require extensive computing time and CRAN has limits how long we can spend building the package. If you want this vignette to work, you must run all code, either by cutting and pasting it into R. Or, open the .RMD file in RStudio and either change run.simulations to TRUE or press "play" (the green arrow) on each chunk in RStudio. We set a variable `run.simulations`. This will run simulations and regenerate data files. If this is set to `TRUE`, make sure you have the following data file: * R_CvT_Database/CvT_all_tables_2202-08-17.Rdata: CvT data inside the [CvTdb](https://doi.org/10.1038%2Fs41597-020-0455-1) * R_CvT_Database/CvT_all_tables_2202-09-14_noSQL.Rdata: CvT data outside the [CvTdb](https://doi.org/10.1038%2Fs41597-020-0455-1) which had to be curated and loaded manually. * CCD-Batch-Search_2022-11-29_07_06_35.xlsx: Excel file loaded from CompTox Dashboard for thirteen chemicals of interest. #Note - also need file invitrodb_3_5_mc5.Rdata - downloaded ToxCast data. AG 12/7/2023 and if `run.simulations = FALSE`, make sure to have the following files in your working directory: * *`r paste0("meade2023_",format(Sys.time(), "%b_%d_%Y"),".Rdata")`*: + *concentration_data_meade2023*: concentration vs time and dosing data + *conc_predictions_meade2023fDF* : simulations with and without exhalation included in the model, joined with concentration data. + *supptab1_meade2023*: Supplementary Table 1 which contains chemical data used for each of the simulated 13 chemicals saved in your working directory. + *ivive_meade2023* : IVIVE information for part 2 (if run.simulations.2 = FALSE) ```{r runchunks, eval = TRUE} # Set whether or not the following chunks will be executed (run): run.simulations <- FALSE # Change FALSE to TRUE here to make the vignette work make.plots <- FALSE # change to TRUE to make the plots run.simulations.2 <- FALSE #for second part of vignette make.plots.2 <- FALSE # change to TRUE to make the plots ``` ### Set working directory and install packages Here, we set our working directory (the location of this file and the necessary data files) and load necessary packages. **Please reset your working directory below to `loc.wd` or the code will produce an error.** We also set the location of the R package `httk` using the name `loc.httk` so that we can run the functions `modelinfo_dermal.R` and `modelinfo_dermal_1subcomp.R` file later in the code. Note that if you have `run.simulations = FALSE`, the variable `loc.httk` is not used, so you do not need to provide or change it. ```{r setup, eval = TRUE} # Set working directory loc.wd <- "C:/Users/jwambaug/OneDrive - Environmental Protection Agency (EPA)/Profile/Documents/Research Projects/DermalTK/vingette" #loc.wd <- "/Users/ameade/Library/CloudStorage/GoogleDrive-aemeade7@gmail.com/My Drive/EPA/HTTK-Dermal/HTTK-Dermal-Vignette" #loc.wd = "C:/Users/cschacht/OneDrive - Environmental Protection Agency (EPA)/Profile/Documents/CCTE_2023_2024/dermal/thisone" # Set location of httk package #loc.httk <- "C:/Users/jwambaug/git/httk-dev/httk" # Load packages packages <- c("httk","dplyr", "tidyverse","xlsx","Metrics", #for data manipulation "ggplot2","ggforce","ggpubr","ggrepel","viridis", #for plotting "knitr", "ggpubr","grid","ggh4x","readr","ggforce","knitr","tidyr", "stringr") #to make pretty visual tables sapply(packages, require, character.only=TRUE) # Make sure we're using a clean version of the httk chemical library: #reset_httk() ``` # Result 1: Model Verification using *In Vivo* Data First, we want to verify our model by testing it against *in vivo* data from literature. ## Step One: Load Data We have some data which is taken from the CvT Database ([Sayre, Wambaugh, and Grulke, 2020](https://doi.org/10.1038%2Fs41597-020-0455-1)) and some data which has been prepared for the database but is not yet added to the Database, denoted as `noSQL` since it is not yet in the SQL space. The CvT Database contains structured data in the form of separate datasets linked with foreign keys ("fk"). Below is the entity-relationship for each of the databases. To learn more about where these datasets come from, please see `R_CvT_Database` folder, which contains files used to generate the data below. ```{r 1_load_data, eval = run.simulations} # Load CvT table database load(paste0(loc.wd,"/R_CvT_Database/CvT_all_tables_2022-08-17.Rdata")) #Necessary file that is not included yet # Load CvT data that is not yet in CvT database (Annabel did this) load(paste0(loc.wd,"/R_CvT_Database/CvT_all_tables_2023-07-05_noSQL.Rdata")) #Necessary file that is not included yet #Add cumulative_amount column in SQL DB_series (we don't use any cumulative amounts from SQL data in final results) DB_series <- DB_series %>% mutate(cumulative_amount=NA) # Bind data sets from database and noSQL collection. DB_documents_noSQL<-DB_documents_noSQL[colnames(DB_documents_noSQL) %in% colnames(DB_documents)] #Keep only columns from DB_documents_noSQL that are also found in DB_documents DB_documents <- rbind(DB_documents,DB_documents_noSQL) DB_studies_noSQL<-DB_studies_noSQL[colnames(DB_studies_noSQL) %in% colnames(DB_studies)] DB_studies <- rbind(DB_studies,DB_studies_noSQL) DB_chemicals <- rbind(DB_chemicals,DB_chemicals_noSQL) DB_subjects <- rbind(DB_subjects,DB_subjects_noSQL) DB_series <- rbind(DB_series,DB_series_noSQL) DB_conc_time_values_noSQL<-DB_conc_time_values_noSQL[colnames(DB_conc_time_values_noSQL) %in% colnames(DB_conc_time_values)] DB_conc_time_values <- rbind(DB_conc_time_values,DB_conc_time_values_noSQL) ``` Next, we need to clean the data and filter out `NA` results. First, we join the needed data sets using their foreign keys. * `DB_conc_time_values` * `DB_series` * `DB_studies` * `DB_subjects` * `DB_documents`: Note that only the PubMed ID (PMID) of each extraction article is saved from this data set. Next, we remove columns that are duplicated in each data set and are not used here. These columns in each data set, listed below, tell the user who created and updated the SQL files and additional comments by the data curator or article author. * `created_by` * `updated_by` * `rec_update_dt` * `rec_created_dt` * `curator_comment` * `author_comment` ( not in `DB_conc_time_values` or `DB_subjects`) We also remove any data that: * does not have a defined DTXSID for the chemical * does not have a recorded concentration * does not have a defined recorded medium (e.g., plasma, blood, urine, etc.) * where the analyte measured does not match the test substance used to dose the individual Finally, after cleaning this data set, called `df.full`, we filter so that we only have data for chemicals that have dermal data. Our final data set is called `df`. ```{r 1_filter_data, eval = run.simulations} # Join necessary DBs: conc_time_values, series, studies, subjects df.full <- DB_conc_time_values %>% left_join(DB_series,by=c("fk_series_id"="id")) %>% left_join(DB_studies,by=c("fk_study_id"="id")) %>% left_join(DB_subjects,by=c("fk_subject_id"="id")) %>% left_join(DB_documents[,c("id","pmid")],by=c("fk_extraction_document_id" = "id")) %>% dplyr::rename(extraction_document_pmid=pmid) %>% select(!(ends_with(".x") | ends_with(".y"))) %>% filter(!is.na(test_substance_dtxsid)) %>% #remove series w/o defined chemicals (Changed from study to series AG) filter( !is.na(conc)) %>% #filter out NA data filter(!is.na(conc_medium_normalized)) %>% #remove series w/o defined medium (i.e., plasma, blood, urine, etc.) filter(test_substance_dtxsid==analyte_dtxsid | is.na(analyte_dtxsid)) #analyte (measured substance) and test substance must be the same df.full <- distinct(df.full) # Remove empty cells df.full <- type_convert(df.full) df.full <- df.full %>% filter(!is.na(conc)) # Get list of chemicals that have dermal data df.dermal <- df.full %>% filter(administration_route_normalized=="dermal") dermal_dtxsid <- unique(df.dermal$test_substance_dtxsid) # Only keep data for chemicals that have dermal data df <- df.full %>% filter(test_substance_dtxsid %in% dermal_dtxsid) ``` ## Display which chemicals are being looked at ```{r summarize_cvt, eval=run.simulations} table <- df %>% mutate(Chemical = get_chem_id(dtxsid=test_substance_dtxsid)$chem.name) %>% dplyr::count(Chemical,test_substance_dtxsid,extraction_document_pmid,sort=F) %>% dplyr::rename(Number.of.Data.Points = n) row.names(table) <- 1:nrow(table) knitr::kable(table, row.names=TRUE,caption="List of chemicals with dermal data before cleaning data.") write.table(table, file=paste0(loc.wd,"/tables/ChemicalswDermalData.txt"), col.names=TRUE, row.names=FALSE, quote=FALSE, sep="\t") ``` Next, We look at several chemical factors with respect to our data. Thus, we ran a batch search of the 13 chemicals in our study in the CompTox Dashboard and now load information about these chemicals (Boiling point, Volatility, etc.) and save this as 'supptab1_meade2023'. ```{r 1_supptable, eval=run.simulations} # Load chemical info supptab1_meade2023 <- read.xlsx(file=paste0(loc.wd,"/CCD-Batch-Search_2022-11-29_07_06_35.xlsx"), sheetName="Main Data") #A separate download was needed for this file AG 12/6/23 # Rename chemical properties and change units supptab1_meade2023 <- supptab1_meade2023 %>% mutate(Water.Solubility.mg.L = WATER_SOLUBILITY_MOL.L_OPERA_PRED*AVERAGE_MASS*1000, Vapor.Pressure.mmHg = VAPOR_PRESSURE_MMHG_OPERA_PRED, Boiling.Point.C = BOILING_POINT_DEGC_OPERA_PRED, logP = OCTANOL_WATER_PARTITION_LOGP_OPERA_PRED, MW = AVERAGE_MASS, ) %>% select(c(PREFERRED_NAME,DTXSID,Water.Solubility.mg.L,Vapor.Pressure.mmHg,Boiling.Point.C,logP,MW)) # Add column for "Volatility supptab1_meade2023 <- supptab1_meade2023 %>% mutate(Volatility = ifelse(Boiling.Point.C<=75,"Very Volatile", ifelse(Boiling.Point.C>400,"Not Volatile", ifelse((Boiling.Point.C>=250)&(Boiling.Point.C<=400),"Semi-Volatile","Volatile"))), # Rename long chemicals to nick-names PREFERRED_NAME = ifelse(PREFERRED_NAME=="Methyl tert-butyl ether","MBTE",PREFERRED_NAME), PREFERRED_NAME = ifelse(PREFERRED_NAME=="3,3',5,5'-Tetrabromobisphenol A","Bromdian",PREFERRED_NAME), PREFERRED_NAME = ifelse(PREFERRED_NAME=="4,4'-Sulfonyldiphenol","Bisphenol S",PREFERRED_NAME), PREFERRED_NAME = ifelse(PREFERRED_NAME=="Dichloromethane","DCM",PREFERRED_NAME), PREFERRED_NAME = ifelse(PREFERRED_NAME=="Tetrachloroethylene","PERC/TCE",PREFERRED_NAME), PREFERRED_NAME = ifelse(PREFERRED_NAME=="1-Methylbenzene","Toluene",PREFERRED_NAME)) supptab1_meade2023$Volatility <- factor(supptab1_meade2023$Volatility, levels=c("Not Volatile","Semi-Volatile","Volatile","Very Volatile")) #save chemical dataframe save(supptab1_meade2023,file=paste0("meade2023_",format(Sys.time(), "%b_%d_%Y"),".Rdata")) supptab1_meade2023[,c(3:7)]=signif(supptab1_meade2023[,c(3:7)],3) knitr::kable(supptab1_meade2023, #row.names=TRUE, caption="Chemical properties", floating.environment="sidewaystable", align="c") write.table(supptab1_meade2023, file=paste0(loc.wd,"/tables/ChemicalProperties.txt"), col.names=TRUE, row.names=FALSE, quote=FALSE, sep="\t") ``` ## Step Two: Clean and Format Data Next we clean the data by looking only at data that can be simulated using the httk dermal model. First, we load the most recent QSAR in *silico data* ([Dawson et al., 2021](https://doi.org/10.1021/acs.est.1c05924)) from `httk`. #Add a statement about why QSAR values are needed - there are many chemicals for which httk does not have experimentally derived Clint and Fup. - AG 01/01/2024 ```{r 1_load_QSAR, eval=run.simulations} load_dawson2021() #get QSAR data, overwrite for p-values greater than 0.05 ``` Next, we only load data for routes `iv`, `oral`, and of course `dermal`. We also must remove the chemicals: dibromomethane, halothane, and isoflurane, because these chemicals do not have *in silico* or *in vitro* data to inform necessary parameters for the httk dermal model. ```{r 1_clean_data, eval=run.simulations} # Select route (dermal/iv/oral) route.ls = c("iv","dermal","oral") # Reduce data based on available in vitro data # df.reduced <- df %>% filter(administration_route_normalized %in% route.ls) %>% #only iv, oral, dermal routes # filter(test_substance_dtxsid %!in% c("DTXSID4021557","DTXSID4025371","DTXSID3020752")) #no metabolism data for: dibromomethane, halothane, isoflurane df.reduced <- df %>% filter(administration_route_normalized %in% route.ls) %>% #only iv, oral, dermal routes filter(!(test_substance_dtxsid %in% c("DTXSID4021557","DTXSID4025371","DTXSID3020752"))) #no metabolism data for: dibromomethane, halothane, isoflurane df.reduced <- type_convert(df.reduced) #change dose_level_normalized to numeric #Join df.reduced with the Volatility/chemical info from supptab1_meade2023 df.reduced = df.reduced %>% left_join(supptab1_meade2023,by=c("test_substance_dtxsid"="DTXSID")) df.reduced=subset(df.reduced, !(conc_medium_normalized %in%c("feces","urine"))) # Display which chemicals are being looked at table <- df.reduced %>% mutate(Chemical = get_chem_id(dtxsid=test_substance_dtxsid)$chem.name) %>% dplyr::count(Chemical,test_substance_dtxsid,extraction_document_pmid,conc_medium_normalized,sort=F) %>% dplyr::rename("Number.of.Data.Points" = n) row.names(table) <- 1:nrow(table) knitr::kable(table, row.names=TRUE, caption="List of chemicals with dermal data after cleaning data.") write.table(table, file=paste0(loc.wd,"/tables/ChemicalswDermalDataCleaned.txt"), col.names=TRUE, row.names=FALSE, quote=FALSE, sep="\t") ``` Next, we clean up the data so we can match experimental conditions to simulation conditions. We start by normalizing concentration and concentration units (where concentration refers to the experimental concentration vs. time data) ```{r conc_cleanup, eval=run.simulations} # Set dummy column for Concentration units df.reduced$conc_normalized = NA df.reduced$conc_units_normalized = df.reduced$conc_units_original allset = which(df.reduced$conc_units_original %in% c("ug/mL","ug/g","mg/kg", "mcg/mL","mg/L")) df.reduced$conc_units_normalized[allset]="mg/L"; df.reduced$conc_normalized[allset]=df.reduced$conc_original[allset] ngml = which(df.reduced$conc_units_original=="ng/ml");df.reduced$conc_normalized[ngml]= df.reduced$conc_original[ngml]*0.001 df.reduced$conc_units_normalized[ngml]="mg/L" ngl = which(df.reduced$conc_units_original=="ng/l");df.reduced$conc_normalized[ngl]= df.reduced$conc_original[ngl]*1e-06 df.reduced$conc_units_normalized[ngl]="mg/L" molunts = which(df.reduced$conc_units_original %in% c("uM","nmol/L","umol/L")) dtx=unique(df.reduced[molunts,c("test_substance_dtxsid","conc_units_original")]) dtx$mw = get_physchem_param(param="MW", dtxsid=dtx$test_substance_dtxsid) for(z in 1:dim(dtx)[1]){ these.rows=which(df.reduced$test_substance_dtxsid==dtx$test_substance_dtxsid[z] & df.reduced$conc_units_original==dtx$conc_units_original[z]) conversion_val = convert_units(input.units = dtx$conc_units_original[z],output.units = "mg/L",dtxsid=dtx$test_substance_dtxsid[z]) df.reduced[these.rows,"conc_normalized"] = df.reduced[these.rows,"conc_original"] * conversion_val df.reduced$conc_units_normalized[these.rows] = "mg/L" } perc.dose=which(df.reduced$conc_units_original %in%c("% dose", "% dose/h")) df.reduced[perc.dose, "conc_normalized"]=df.reduced[perc.dose, "conc_original"]*df.reduced[perc.dose, "dose_level_original"] #df.reduced[perc.dose,"conc_units_normalized"]=df.reduced[perc.dose,"dose_level_original_units"] df.reduced[perc.dose,"conc_units_normalized"]="mg/L" # ppbv ppbv = which(df.reduced$conc_units_original=="ppbv") df.reduced[ppbv, "conc_normalized"] = convert_units(input.units = "ppbv",output.units = "mg/L",dtxsid=unique(df.reduced$analyte_dtxsid[ppbv]),state="gas") * df.reduced[ppbv, "conc_original"] df.reduced$conc_units_normalized[ppbv] = "mg/L" ``` We also convert the dose units from the database to be in amounts (not concentrations) using the vehicle volumes. After doing this, all unique dosing scenarios are listed in a table. ```{r 1_convert_dose_units, eval=run.simulations} # Convert ug/mL to mg df.reduced <- df.reduced %>% separate(dose_volume, into=c("dose_volume","dose_volume_units"), sep=" ") %>% #separate dose volume into number and units, units either NA or ml/mL mutate(dose_volume=as.numeric(dose_volume)) %>% mutate(dose_level_new = ifelse(tolower(dose_level_original_units)=="ug/ml", dose_level_original*dose_volume/1e+3, # ug >- mg, vol in mL dose_level_original), #convert ug/ml to mg dose_level_new_units = ifelse(tolower(dose_level_original_units)=="ug/ml", "mg", dose_level_original_units)) # Convert ppm units to mg/L (assuming ppm is in air) - add MW to dataframe to do this df.reduced <- df.reduced %>% left_join(chem.physical_and_invitro.data[,c("DTXSID","MW")], by=c("test_substance_dtxsid"="DTXSID")) df.reduced$MW.y=NULL colnames(df.reduced)=gsub("MW.x","MW",colnames(df.reduced)) #fix MW df.reduced = df.reduced %>% # mutate(MW = as.numeric(MW)) %>% group_by(test_substance_dtxsid) %>% mutate(dose_level_new = ifelse(dose_level_original_units=="ppm" , dose_level_new * convert_units(input.units="ppmw", #?? whatever output.units = "mg/l", MW = unique(MW)), dose_level_new), # dose_level_new = # ifelse(dose_level_original_units=="ppm" & administration_route_original!="dermal vapor", # dose_level_new * # convert_units(input.units="ppmw", #?? whatever # output.units = "mg/l", # MW = unique(MW), state="liquid"), # dose_level_new), dose_level_new_units = ifelse(tolower(dose_level_original_units)=="ppm", "mg/L", dose_level_new_units)) df.reduced$dose_level_new_units[which(df.reduced$dose_level_new_units=="ppm")]="mg/L" # Set normalized vehicle df.reduced <- df.reduced %>% mutate(dose_vehicle_normalized = ifelse(grepl("corn oil",dose_vehicle) | grepl("ethanol",dose_vehicle),"octanol", ifelse(!is.na(dose_vehicle) & !grepl("Cookie",dose_vehicle), "water", NA))) # Remove added chemical MW column for now df.reduced <- df.reduced %>% select(!MW) # Display dosing scenarios table <- df.reduced %>% mutate(dose_level_original=signif(dose_level_original,3), dose_level_new = signif(dose_level_new,3), Chemical = get_chem_id(dtxsid=test_substance_dtxsid)$chem.name) %>% count(Chemical,dose_level_original, dose_level_original_units, dose_level_new, dose_level_new_units, dose_volume, dose_volume_units, dose_vehicle, dose_vehicle_normalized, dermal_applied_area, dermal_applied_area_units,sort=F) %>% rename("Number.of.Data.Points" = n) row.names(table) <- 1:nrow(table) knitr::kable(table, #row.names=TRUE, caption="Dosing Scenarios in CvT Data", floating.environment="sidewaystable", align="c") ``` ## Display which chemicals are being looked at ```{r summarize_cvt2, eval=run.simulations} table <- df %>% mutate(Chemical = get_chem_id(dtxsid=test_substance_dtxsid)$chem.name) %>% dplyr::count(Chemical,test_substance_dtxsid,extraction_document_pmid,sort=F) %>% dplyr::rename(Number.of.Data.Points = n) row.names(table) <- 1:nrow(table) knitr::kable(table, row.names=TRUE,caption="List of chemicals with dermal data before cleaning data.") ``` Now, update certain experimental parameters. If body weight is not reported, set it as the population mean, calculate skin area, and specify the model output name that corresponds to each medium. Also, we need to extract dose duration information for explicit values for the length of duration and the length of duration units (in hours, making conversions when necessary). ```{r 1_updateparms, eval=run.simulations} df.new=df.reduced df.new$species = tolower(df.new$species) # get dose duration in numbers and time units df.new$dose_duration_num = as.numeric(str_extract(df.new$dose_duration, "[0-9]+")) df.new$dose_duration_units = str_extract(df.new$dose_duration, "[aA-zZ]+") df.cases <- df.new %>% mutate(BW = ifelse(!is.na(weight_kg),weight_kg, case_when( species=="rat" ~ 0.4, species=="human" ~ 70, species=="mouse" ~ 0.025, species=="rabbit" ~ 1.6 )), # Set human default height to 175 cm if not listed height_cm = ifelse(species=="human" & is.na(height_cm),175,height_cm), totalSA = case_when( species=="rat" ~ 9.83*(BW*1000)^(2/3), # formula from Gouma et al., 2012, 195-240grams species=="human" ~ sqrt(height_cm * BW/3600) * 100^2, species=="mouse" ~ 9.83*(BW*1000)^(2/3), species=="rabbit" ~ 100*11*(BW)^(2/3) # formula from Tadashi et al., 2018 ), #Assume Fskin_exposed = 10% if not stated Fskin_exposed = ifelse(!is.na(dermal_applied_area),dermal_applied_area/totalSA,0.1), #Infinite Dose if dosing units are concentration mg/l AJG 4/13/24: Double check that mg/l are listed in the correct places in the previous chunk. InfiniteDose = ifelse(dose_level_new_units=="mg/L",1,0), # omg Vvehicle = ifelse(tolower(dose_volume_units)=="ml",dose_volume/1e3,dose_volume), #Standardize time units for dose duration AJG 5/7/24 dose_duration_units = case_when( dose_duration_units %in% c("hr","h","hour","hrs","hours") ~ "hours", dose_duration_units %in% c("min","m","minute","minutes","mins") ~ "mins", dose_duration_units %in% c("day","d","days") ~ "days" ), dose_duration_num2 = ifelse(dose_duration_units=="mins", dose_duration_num/60, dose_duration_num), dose_duration_units2 = ifelse(dose_duration_units=="mins", "hours", dose_duration_units) ) # remove experiments where conc_medium_normalized = feces and set observation to collect df.cases = subset(df.cases, !(conc_medium_normalized %in%c("urine","feces"))) df.cases = df.cases%>% mutate(conc_medium_obs = recode(conc_medium_normalized, "blood"="Cven", "liver"="Cliver", "lung"="Clung", "kidney"="Ckidney", "plasma"="Cplasma", "serum"="Cplasma", "expired air" = "Aexhaled" )) # Display dosing scenarios table <- df.cases %>% mutate(dose_level_original=signif(dose_level_original,3), dose_level_new = signif(dose_level_new,3), Chemical = get_chem_id(dtxsid=test_substance_dtxsid)$chem.name) %>% count(Chemical, species,dose_level_original, dose_level_original_units, dose_level_new, dose_level_new_units, dose_volume, dose_volume_units, dose_vehicle, dose_vehicle_normalized, dose_duration_num2, dermal_applied_area, dermal_applied_area_units, extraction_document_pmid,sort=F) row.names(table) <- 1:nrow(table) knitr::kable(table, #row.names=TRUE, caption="Dosing Scenarios in CvT Data", floating.environment="sidewaystable", align="c") write.table(table, file=paste0(loc.wd,"/tables/CvTdbDosingScenarios.txt"), col.names=TRUE, row.names=FALSE, quote=FALSE, sep="\t") ``` Finally, we clarify exhalation (depending on chemical volatility) and set vehicle volume parameters and save the entire cleaned datasheet as concentration_data_meade2023. ```{r 1_mergetable, eval=run.simulations} df.cases_new = df.cases df.cases_new$Kvehicle2water = df.cases_new$dose_vehicle_normalized # set this parameter df.cases_new = df.cases_new %>% mutate( administration_route_normalized = case_when( administration_route_normalized == "oral" ~ "Oral", administration_route_normalized == "iv" ~ "IV", administration_route_normalized == "dermal" ~ "Dermal" ), #Add exhalation for volatile compounds Exhalation = case_when( levels(Volatility)[Volatility] == "Very Volatile" ~ TRUE, levels(Volatility)[Volatility] == "Volatile" ~ TRUE, levels(Volatility)[Volatility] == "Semi-Volatile" ~ TRUE, levels(Volatility)[Volatility] == "Not Volatile" ~ FALSE ), # Set default vehicle volume when it is unknown. Assume a uniform 1 mm layer of vehicle on skin. Then there is 0.0001 L vehicle per cm^2 of exposed skin. For cases with non-dermal routes of exposure, these values are NA. Vvehicle = case_when( administration_route_normalized=="Dermal" ~ ifelse(!is.na(Vvehicle), Vvehicle, Fskin_exposed*totalSA*0.0001), administration_route_normalized=="Oral" ~ NA, administration_route_normalized=="IV" ~ NA) ) # save what we want concentration_data_meade2023 = df.cases_new cgwtools::resave(concentration_data_meade2023,file=paste0("meade2023_",format(Sys.time(), "%b_%d_%Y"),".Rdata")) ``` ## Step Three: Simulate Data Now, we will simulate the *in vivo* concentration vs time data using the httk dermal model. Using `concentration_data_meade2023`, which is clean and filtered above, we create a new data set called `df.simulations` which will contain simulation data. We simulate 6 different dermal scenarios: * Potts-Guy, vehicle=octanol * Potts-Guy, vehicle=water * Surrey, vehicle=octanol * Surrey, vehicle=water * 2 Compartment, vehicle=octanol * 2 Compartment, vehicle=water ### Simulation Set-Up Specify the units we want returned and set up simulation data. ```{r 1_simsetup, eval=run.simulations} # Set output units for dermal model to be same as data output.units.original <- c( "Agut"="mg", "Agutlumen"="mg", "Aliver"="mg", "Aven"="mg", "Alung"="mg", "Arest"="mg", "Akidney"="mg", "Atubules"="mg", "Ametabolized"="mg", "Avehicle"="mg", "Ain"="mg", "Cgut"="mg/L", "Cliver"="mg/L", "Cven"="mg/L", "Clung"="mg/L", "Cart"="mg/L", "Crest"="mg/L", "Ckidney"="mg/L", "Cplasma"="mg/L", "Cvehicle"="mg/L", "Cvehicle_infinite"="mg/L") df.simulation = concentration_data_meade2023 # Find experiments with unique dosing regimes/species/etc and average those that had multiple time points # for the same type of experiment. Rename each experiment with new_id. keep.cols=c("PREFERRED_NAME","test_substance_dtxsid", "conc_medium_obs", "conc_medium_normalized", "administration_route_normalized", "species", "BW", "dose_level_new", "dose_level_new_units","totalSA","Vvehicle","InfiniteDose","dose_duration_num2","dose_duration_units2","Fskin_exposed","Exhalation","dose_vehicle_normalized","Water.Solubility.mg.L", "Vapor.Pressure.mmHg","Boiling.Point.C","logP", "Volatility", "Kvehicle2water") df.simulation_red=data.frame(df.simulation%>%group_by(across(all_of(c("time_hr",keep.cols) )))%>% summarize(conc_normalized=mean(conc_normalized),.groups="drop")%>%group_by(across(all_of(keep.cols)))%>% arrange(time_hr)%>% mutate(new_id=cur_group_id())%>% ungroup()) df.simulation_red2=df.simulation_red%>%mutate(across(c(dose_duration_num2,dose_duration_units2,Vvehicle),~ifelse(is.na(.x),list(NULL),.x))) ``` ### Simulations: Nested For Loops We have nested for loops to account for each aspect of the data that needs to be simulated. Each ID number (or "new_id") in df.simulation_red2 corresponds to a single experiment (or experiment with averaged concentration data if the subject species, dose, route, etc. was the same), where we have defined the necessary variables: species, chemical, dose, dose duration (when applicable), whether or not an Infinite dose is applicable, route of exposure, the collected medium/tissue being sampled (i.e., plasma, blood, liver) and subject parameters such as body mass (kg) and surface area of skin. Thus, we run through each unique ID and first parameterize the model using parameterize_dermal_pbtk() and then solve the model using solve_dermal_pbtk(). For each ID, we simulate each permeability method (Surrey and Potts-Guy); for each ID/permeability method, we simulate either a water or octanol vehicle; then for each combination, we simulate the model both with and without exhalation. Note that some experimental data indicate the vehicle used, but we still test both vehicles for simulation to compare. Likewise, we calculated volatility for each chemical (which indicate whether or not exhalation actually occurs), but we still generate for all options. Note that solve_dermal_pbtk() can handle IV or oral exposure by setting the "route" input variable. Each model output is saved to a list called conc_predictions_meade2023LIST. Then this list is converted to a data frame. ```{r 1_simulations,eval=run.simulations} df.run.sims = df.simulation_red2 fkID = unique(df.run.sims$new_id) method.perm = c("Potts-Guy", "UK-Surrey")# for Kvehicle2water=c("octanol","water") # for exhalation_for_simulation = c("TRUE","FALSE") conc_predictions_meade2023LIST=list() for (include.e in exhalation_for_simulation){ for(i in 1:length(fkID)){ for(mp in 1:length(method.perm)){ for(k in 1:length(Kvehicle2water)){ sub.df = subset(df.run.sims, new_id == fkID[i]) specs = unique(sub.df[,c("test_substance_dtxsid","conc_medium_obs","administration_route_normalized", "species","BW","dose_level_new","dose_level_new_units", "totalSA","Vvehicle","InfiniteDose","dose_duration_num2","dose_duration_units2", "Fskin_exposed","Exhalation")]) # check for experiments with only 1 timepoint and reset sub.df if(length(sub.df$time_hr)==1){ sub.df=rbind(sub.df, sub.df[rep(1,1),]) sub.df[1,c("conc_normalized","time_hr")]=0 } parms = parameterize_dermal_pbtk(dtxsid = specs$test_substance_dtxsid, species = specs$species, totalSA = specs$totalSA, default.to.human=T, Kvehicle2water = Kvehicle2water[k], InfiniteDose = specs$InfiniteDose, method.permeability = method.perm[mp], model.type="dermal_1subcomp", clint.pvalue.threshold = Inf, suppress.messages = T ) parms$BW = specs$BW parms$Fskin_exposed = specs$Fskin_exposed # Remove exhalation or keep exhalation (except for collected air which is ) if(include.e == FALSE) {parms$Qalvc = 0} if(unique(sub.df$PREFERRED_NAME!="DCM")) { # tolerance is bad for DCM, so set tolerance to default values for all other compounds. rtol=1e-5; atol=1e-5 }else{ rtol=1e-4; atol=1e-4 } sub.df=sub.df[order(sub.df$time_hr),] timey = unique(c(0,sub.df$time_hr)) #time in hours gc() # garbage collection (memory) ok=data.frame(solve_dermal_pbtk(parameters = parms, times = timey/24, #convert to days route = tolower(specs$administration_route_normalized), model.type="dermal_1subcomp", plot = FALSE, output.units = output.units.original, washoff = TRUE, input.units = specs$dose_level_new_units, initial.dose=specs$dose_level_new, dose.duration = unlist(specs$dose_duration_num2), dose.duration.units = unlist(specs$dose_duration_units2), Vvehicle = unlist(specs$Vvehicle), #dosing.dermal = dosing.dermal, Kvehicle2water = Kvehicle2water[k], suppress.messages = TRUE, atol = atol, rtol= rtol, method="lsoda" )) # extract only times we need ok.time = subset(ok, time%in%c(ok$time[!(is.na(match(ok$time, sub.df$time_hr/24)))])) sub.df$Prediction = ok.time[,c(specs$conc_medium_obs)] sub.df$exhalation_for_simulation = include.e # specify vehicle and perm. method and vehicle if dermal if(specs$administration_route_normalized != "Dermal"){ sub.df$method.perm = NA sub.df$Kvehicle2water = NA sub.df$method.name = specs$administration_route_normalized }else{ sub.df$method.perm = method.perm[mp] sub.df$Kvehicle2water = Kvehicle2water[k] sub.df$method.name = paste(method.perm[mp], "- vehicle = ", Kvehicle2water[k]) } conc_predictions_meade2023LIST[[paste0(fkID[i],".",Kvehicle2water[k],".",method.perm[mp],".exh",include.e)]]= sub.df } } } } #Bind data together to create a dataframe and organize some columns conc_predictions_meade2023DF = do.call(rbind, conc_predictions_meade2023LIST) rownames(conc_predictions_meade2023DF)=NULL cgwtools::resave(conc_predictions_meade2023DF, conc_predictions_meade2023LIST, file=paste0("meade2023_",format(Sys.time(), "%b_%d_%Y"),".Rdata")) ``` Now, if run.simulations = FALSE, we must load data. This load supptab1_meade2023, concentration_data_meade2023, conc_predictions_meade2023DF, conc_predictions_meade2023LIST. ```{r load_data, eval=(make.plots & !run.simulations)} load(paste0(loc.wd, "/meade2023_Jun_13_2025.Rdata")) ``` ## Step Four: Calculate Root Mean Squared Log10 Errors (RMSLE) Now, we want to get the RMSLE for each chemical, comparing the model to the data. Let us group by chemical, method, and whether or not we include exhalation. ```{r 1_prepareData, eval = make.plots} conc_predictions_meade2023DF$method.perm = gsub( "UK-Surrey","Surrey", conc_predictions_meade2023DF$method.perm) conc_predictions_meade2023DF$method.name = gsub( "UK-Surrey","Surrey", conc_predictions_meade2023DF$method.name) df.sim = conc_predictions_meade2023DF # Convert aexhaled air from mg to mg/l (assume a room of x liters, for humans - humans # the only species in these exhaled experiments) df.sim[which(df.sim$conc_medium_obs=="Aexhaled"),"Prediction"]=df.sim[which(df.sim$conc_medium_obs=="Aexhaled"),"Prediction"]/10000 df.sim$conc_normalized[which(df.sim$conc_normalized==0)]=1e-10 df.sim$Prediction[which(df.sim$Prediction==0)]=1e-10 df.sim.rmsle=df.sim %>% group_by(PREFERRED_NAME, method.name, exhalation_for_simulation)%>% mutate(cmax_obs = max(conc_normalized), cmax_pred=max(Prediction)) %>% mutate(RMSLE = rmse(log10(conc_normalized),log10(Prediction)), Cmax_RMSLE = rmse(log10(cmax_obs), log10(cmax_pred))) # Linear fit ddf.sim.rmsle <- df.sim.rmsle %>% group_by(method.name) %>% do( { Adjusted.R2 = summary(lm(log10(Prediction) ~ log10(conc_normalized),data=.))$adj.r.squared Coefficient = lm(log10(Prediction) ~ log10(conc_normalized),data=.)$coefficients[2] Intercept = lm(log10(Prediction) ~ log10(conc_normalized),data=.)$coefficients[1] Residuals = lm(log10(Prediction) ~ log10(conc_normalized),data=.)$residuals Fitted.Values = lm(log10(Prediction) ~ log10(conc_normalized),data=.)$fitted.values data.frame(.,Adjusted.R2,Coefficient,Intercept,Residuals,Fitted.Values) } ) ``` ## Step Five: Plot Simulations Let us compare the RMSLE and the volatility (i.e., boiling point) for each compound (assuming exhalation). ```{r 1_plot3, eval = make.plots, fig.height=10, fig.width=20} # Plot RMSLE vs Boiling Point data = subset(df.sim.rmsle, exhalation_for_simulation==TRUE) data=data%>%group_by(PREFERRED_NAME, Boiling.Point.C,method.name)%>% summarize(Chemical.cmax.RMSLE = rmse(log10(cmax_obs),log10(cmax_pred)), Chemical.RMSLE=rmse(log10(conc_normalized),log10(Prediction))) data.rect = data.frame(xmin=c(0,75,250,400), xmax=c(75,250,400,Inf), ymin=rep(-Inf,4), ymax=rep(Inf,4), Volatility=c("Very Volatile", "Volatile", "Semi-Volatile", "Not Volatile")) data.rect$Volatility <- factor(data.rect$Volatility, levels=data.rect$Volatility) viridis.colors <- viridis(4) plot <- ggplot(data=data,aes(x=`Boiling.Point.C`,y=Chemical.RMSLE)) + geom_point()+ geom_smooth(method="lm",formula='y~x') + geom_text_repel(data=data,aes(x=`Boiling.Point.C`,y=Chemical.RMSLE,label=PREFERRED_NAME), max.overlaps=Inf)+ facet_wrap(~method.name,scales="free") + #scale_x_log10() + geom_rect(data=data.rect, aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax,fill=Volatility), alpha=0.5, inherit.aes=FALSE) + scale_fill_manual(values=c(`Not Volatile`=viridis.colors[1], `Semi-Volatile`=viridis.colors[2], `Volatile`=viridis.colors[3], `Very Volatile`=viridis.colors[4]))+ labs(y="RMSLE",title="Boiling Point vs. RMSLE",x="Boiling Point (C)")+ theme_bw() + theme(text=element_text(size=33), plot.title=element_text(hjust=0.5), legend.position="bottom") plot ggsave(paste(getwd(), "/Figures_May1/BPvsRMSLE_suppl.png",sep=""), width = 20, height = 10, units = "in") plot2 <- ggplot(data=data,aes(x=`Boiling.Point.C`,y=Chemical.cmax.RMSLE)) + geom_point()+ geom_smooth(method="lm",formula='y~x') + geom_text_repel(data=data,aes(x=`Boiling.Point.C`,y=Chemical.cmax.RMSLE,label=PREFERRED_NAME), max.overlaps=Inf)+ facet_wrap(~method.name,scales="free") + #scale_x_log10() + geom_rect(data=data.rect, aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax,fill=Volatility), alpha=0.5, inherit.aes=FALSE) + scale_fill_manual(values=c(`Not Volatile`=viridis.colors[1], `Semi-Volatile`=viridis.colors[2], `Volatile`=viridis.colors[3], `Very Volatile`=viridis.colors[4]))+ labs(y="RMSLE",title="Boiling Point vs. Cmax RMSLE",x="Boiling Point (C)")+ theme_bw() + theme(text=element_text(size=33), plot.title=element_text(hjust=0.5), legend.position="bottom") #annotate_figure(plot2,top=text_grob("Boiling Point vs RMSLE",size=40,face="bold")) ggsave(paste(getwd(), "/Figures_May1/BPvsRMSLE_cmax_suppl.png",sep=""), width = 20, height = 10, units = "in") ``` Now, we compare simulations with exhalation to those without exhalation (as a function of boiling point). ```{r 1_exhalation_rmsle_plot, eval=make.plots, fig.height=10, fig.width=30} df.exh.compare=df.sim.rmsle Exh = data.frame(df.exh.compare %>% group_by(PREFERRED_NAME,method.name, Boiling.Point.C)%>% arrange(exhalation_for_simulation) %>% summarize(Cmax=Cmax_RMSLE[exhalation_for_simulation==FALSE]-Cmax_RMSLE[exhalation_for_simulation==TRUE], Full_time_course = RMSLE[exhalation_for_simulation==FALSE]-RMSLE[exhalation_for_simulation==TRUE])) Exh$method.name= sub(" - ", " -\n", Exh$method.name) #stack data stackExh = data.frame(Exh %>% pivot_longer(cols = c("Cmax", "Full_time_course"), names_to = "ind", values_to = "values")) data.text = unique(stackExh[,c("Boiling.Point.C","values","PREFERRED_NAME","ind","method.name")]) data.rect = data.frame(xmin=c(0,75,250,400), xmax=c(75,250,400,Inf), ymin=rep(-Inf,4), ymax=rep(Inf,4), Volatility=c("Very Volatile", "Volatile", "Semi-Volatile", "Not Volatile")) data.rect$Volatility <- factor(data.rect$Volatility, levels=data.rect$Volatility) viridis.colors <- viridis(4) # rename labels for facets # labeller=labeller(ind = c(Cmax="Cmax", # Full_time_course = "Full time-course")) plot <- ggplot(data=stackExh,aes(x=Boiling.Point.C,y=values)) + geom_point()+ #geom_line()+ theme_bw()+ geom_smooth(method='lm') + facet_grid(ind~method.name, scales="free", labeller=labeller(ind = c(Cmax="Cmax", Full_time_course = "Full time-course")))+ geom_rect(data=data.rect, aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax,fill=Volatility), alpha=0.5, inherit.aes=FALSE) + scale_fill_manual(values=c(`Not Volatile`=viridis.colors[1], `Semi-Volatile`=viridis.colors[2], `Volatile`=viridis.colors[3], `Very Volatile`=viridis.colors[4]))+ geom_text_repel(data=data.text,aes(x=Boiling.Point.C,y=values,label=PREFERRED_NAME), max.overlaps=Inf)+ #facet_grid(cols=vars(Route.or.Method.if.Dermal), rows=vars(RMSLE.Type),scales="free_y")+ #scale_x_log10() + geom_hline(aes(yintercept=0)) + labs(y=expression(Delta~"RMSLE"),x="Boiling Point (C)",title="Change in RMSLE when Exhalation is Added")+ scale_color_brewer(type="qual",palette="Paired") + theme(axis.text.x = element_text(angle=45,vjust=1,hjust=1), plot.title=element_text(hjust=0.5), text=element_text(size=30), legend.position="bottom") plot ggsave(paste(getwd(), "/Figures_May1/BP_RMSLE_Exhalation_main.png",sep=""), width = 20, height = 10, units = "in") # P-values for regression lines stackExhStats = stackExh%>% group_by(ind, method.name) %>% summarize(model = list(lm(values~Boiling.Point.C, data=cur_data())), .groups="drop")%>% rowwise()%>% mutate(msummary=list(summary(model)), rsquared=msummary$r.squared, p_val= pf(msummary$fstatistic[1], msummary$fstatistic[2], msummary$fstatistic[3], lower.tail=FALSE) )%>%select(ind,method.name,rsquared,p_val) ``` Now, let's generate CvT plots for each chemical and color by whether or not exhalation was included. ```{r cvt_plot_by_exh, eval =make.plots, fig.height=3, fig.width=3 } df.sim2 = df.sim.rmsle chem = unique(df.sim2$PREFERRED_NAME) for(chx in 1:length(chem)){ each.chem = subset(df.sim2, PREFERRED_NAME==chem[chx]) #each.chem$dose = paste(each.chem$dose_level_new, each.chem$dose_level_new_units) each.chem$dose=paste(each.chem$dose_level_new, each.chem$dose_level_new_units) # each.chem$dose = factor(each.chem$dose, # levels=unique(each.chem$dose)[order(unique(each.chem$dose_level_new))]) each.chem$method.name= sub(" - ", " -\n", each.chem$method.name) each.chem$subject=paste(each.chem$species, "BW",each.chem$BW, "kg") ggplot(each.chem)+ geom_point(data=each.chem, aes(x=time_hr, y=conc_normalized,colour=dose))+ geom_line(data=each.chem, aes(x=time_hr, y=Prediction,colour=dose,linetype=exhalation_for_simulation),linewidth=1)+ labs(title=paste(chem[chx], "\n Log-transformed Concentrations v time"), x=" Time (hr)", y="Log Concentrations (mg/L)", colour="")+ facet_nested(~conc_medium_normalized+method.name,scales="free")+ scale_y_log10() + theme(legend.position = "bottom") ggsave(paste(getwd(), "/Figures_May1/cvt_eachchem_by_exhalation_suppl", unique(each.chem$test_substance_dtxsid),".png",sep=""), width = 8, height = 6, units = "in") } ``` Now plot the data in Predicted vs Observed (PvO) plots for each of the 13 chemicals. First, we extract only simulated results that match the correct exhalation conditions by finding simulations where exhalation_for_simulations = T if Exhalation (which was calculated from volatility parameters), is TRUE and FALSE if FALSE. ```{r 1_plot1,eval =make.plots, fig.height=3, fig.width=3} df.new = subset(df.sim.rmsle, exhalation_for_simulation==Exhalation) chem = unique(df.new$PREFERRED_NAME) for(chx in 1:length(chem)){ each.chem = subset(df.new, PREFERRED_NAME==chem[chx]) #each.chem$dose = paste(each.chem$dose_level_new, each.chem$dose_level_new_units) each.chem$dose=paste(each.chem$dose_level_new, each.chem$dose_level_new_units) each.chem$dose = factor(each.chem$dose, levels=unique(each.chem$dose)[order(unique(each.chem$dose_level_new))]) each.chem$method.name= sub(" - ", " -\n", each.chem$method.name) each.chem$subject=paste(each.chem$species, "BW",each.chem$BW, "kg") ggplot(each.chem, aes(x=conc_normalized,y=Prediction))+ geom_point(aes(colour=dose,shape=conc_medium_normalized))+ labs(title=paste(chem[chx], "\n Log-transformed Concentrations"), x="Observed Concentrations (mg/L)", x="Predicted Concentrations (mg/L)", colour="", shape="")+ #facet_wrap(~dose,scales="free")+ facet_wrap(~method.name,scales="free")+ geom_smooth(method='lm',formula='y~x',colour="black")+ geom_abline(linetype=2)+ #geom_smooth(aes(color=method.name),method='lm',formula='y~x')+ scale_x_log10() + scale_y_log10() + theme_bw() + theme(text=element_text(size=11), strip.text=element_text(size=13)) #theme(legend.position = "bottom") ggsave(paste(getwd(), "/Figures_May1/pvo_bydose_suppl", unique(each.chem$test_substance_dtxsid),".png",sep=""), width = 6, height = 6, units = "in") } # Linear fit for these. linear.fit.new <- df.new %>% group_by(method.name) %>% do( { Adjusted.R2 = summary(lm(log10(Prediction) ~ log10(conc_normalized),data=.))$adj.r.squared Coefficient = lm(log10(Prediction) ~ log10(conc_normalized),data=.)$coefficients[2] Intercept = lm(log10(Prediction) ~ log10(conc_normalized),data=.)$coefficients[1] Residuals = lm(log10(Prediction) ~ log10(conc_normalized),data=.)$residuals Fitted.Values = lm(log10(Prediction) ~ log10(conc_normalized),data=.)$fitted.values data.frame(.,Adjusted.R2,Coefficient,Intercept,Residuals,Fitted.Values) } ) # RMSLE grouped by method type rmsle.all = df.new%>%group_by(method.name)%>% summarize( Adjusted.R2 = summary(lm(log10(Prediction) ~ log10(conc_normalized)))$adj.r.squared, RMSLE_total =rmse(log10(Prediction),log10(conc_normalized)), CMAX_RMSLE_total =rmse(log10(cmax_pred) ,log10(cmax_obs) ) ) ``` now plot log-transformed predicted vs. observed values and group by permeability method, vehicle (or route of exposure), and tested tissue. ```{r 1_plot6, eval=make.plots, fig.height=10, fig.width=20} df.new = subset(df.sim, exhalation_for_simulation==Exhalation) df.sim.rmsle2=df.new %>% group_by(PREFERRED_NAME, method.name)%>% mutate(cmax_obs = max(conc_normalized), cmax_pred=max(Prediction)) %>% mutate(RMSLE = rmse(log10(conc_normalized),log10(Prediction)), Cmax_RMSLE = rmse(log10(cmax_obs), log10(cmax_pred))) col=c("firebrick", "indianred", "blue4","steelblue1","#2ca25f", "#addd8e") plot.pvo=ggplot(df.sim.rmsle2, aes(x=conc_normalized, y=Prediction))+ geom_point(aes( colour=method.name,shape=PREFERRED_NAME), alpha=1,size=2)+ labs(title=paste0("(A) Concentrations"), y = "Simulated (mg/L)", x="Observed (mg/L)", color="Route or\n Method",shape="Chemical")+ scale_shape_manual(values=1: length(unique(df.sim.rmsle2$PREFERRED_NAME)))+ geom_smooth(aes(colour=method.name),method="lm",se=F)+ #scale_color_brewer(type="qual",palette="Paired",direction=-1) + scale_colour_manual(values=col)+ theme_bw() + scale_x_log10() + scale_y_log10() + geom_abline(lty="dashed") + #coord_fixed(xlim=lims,ylim=lims) + theme(plot.title=element_text(hjust=0.5), text=element_text(size=30)) plot.cmax = ggplot(df.sim.rmsle2, aes(x=cmax_obs, y=cmax_pred))+ geom_point(aes( colour=method.name,shape=PREFERRED_NAME), alpha=1,size=2)+ labs(title=paste0("(B) Max Concentrations"), y = "Simulated (mg/L)", x="Observed (mg/L)", color="Route or\n Method", shape="Chemical")+ #scale_color_brewer(type="qual",palette="Paired",direction=-1) + scale_colour_manual(values=col)+ scale_shape_manual(values=1: length(unique(df.sim.rmsle2$PREFERRED_NAME)))+ theme_bw() + scale_x_log10() + scale_y_log10() + geom_abline(lty="dashed") + #coord_fixed(xlim=lims,ylim=lims) + theme(plot.title=element_text(hjust=0.5), text=element_text(size=30)) ggarrange(plot.pvo, plot.cmax, common.legend=T,legend="right") ggsave(paste(getwd(), "/Figures_May1/PvO_main.png",sep=""), width = 20, height = 10, units = "in",dpi=300) method.rmsle.only=df.new %>% group_by(PREFERRED_NAME, method.name)%>% mutate(cmax_obs = max(conc_normalized), cmax_pred=max(Prediction)) %>% summarize(RMSLE = rmse(log10(conc_normalized),log10(Prediction)), Cmax_RMSLE = rmse(log10(cmax_obs), log10(cmax_pred))) %>%group_by(method.name)%>%summarize(mean(Cmax_RMSLE), mean(RMSLE)) ``` Now let's plot a heat map of the RMSLE and Cmax RMSLE. We will subset the dataframe to only plot the simulations appropriate for exhalation (that is, for a volatile chemical where Exhalation = TRUE, we will extract the solutions when exhalation_for_simulation = TRUE).We will also make note of the experiments for which a dosing vehicle was specified (water or octanol). For dermal, we will plot the results using both permeability methods and both vehicle methods, but will mark boxes over the cases where the vehicle matched to the simulated vehicle. We also want to plot the chemicals in order of volatility (based on boiling point). ```{r 1_exhalation_correct, eval=make.plots} # Extract correct exhalation # df = conc_predictions_meade2023DF # df$conc_normalized[which(df$conc_normalized==0)]=1e-10 # df$Prediction[which(df$Prediction==0)]=1e-10 df=df.sim dfmatchE = subset(df, exhalation_for_simulation==Exhalation) # order chemicals by volatility vol.chem = unique(dfmatchE[,c("PREFERRED_NAME","Boiling.Point.C")]) vol.chem2=vol.chem[order(vol.chem$Boiling.Point.C,decreasing=F),] #also want to extract table to identify experiments that actually did list the vehicle as either water or octanol (or similar) dfmatchE2 = data.frame(dfmatchE%>%group_by(PREFERRED_NAME,method.name,dose_vehicle_normalized,Kvehicle2water,administration_route_normalized)%>% mutate(cmax_obs = max(conc_normalized), cmax_pred=max(Prediction)) %>% summarize(Cmax_RMSLE = rmse(log10(cmax_obs),log10(cmax_pred)), RMSLE = rmse(log10(conc_normalized),log10(Prediction)))) # Reorganize the columns so RMSLE values can be grouped by Cmax or total RMSLE and then take average RMSLE values of oral or iv values so we have one per chemical dfmatchE3 = data.frame(dfmatchE2 %>% pivot_longer(cols = c("Cmax_RMSLE", "RMSLE"), names_to = "ind", values_to = "value") %>% group_by(PREFERRED_NAME,ind,administration_route_normalized) %>% mutate(value = if_else(administration_route_normalized !="Dermal", mean(value[administration_route_normalized !="Dermal"]), value) )) # Make PREFERRED_NAME a factor and set the levels to match those from the volatility order. dfmatchE3$PREFERRED_NAME = factor(dfmatchE3$PREFERRED_NAME, levels=vol.chem2$PREFERRED_NAME) ``` Generate the plot. ```{r 1_heatmap, eval=make.plots, fig.height=8, fig.width=11} ggplot(dfmatchE3, aes(y=PREFERRED_NAME, x=method.name))+ geom_tile(aes(fill=value))+ #geom_tile(aes(fill=Chemical.Cmax.RMSLE.NE))+ facet_wrap(~ind)+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ scale_fill_viridis()+ geom_text(aes(label=round(value,2)),colour="white",size=4)+ labs(title="", x="Route or\n Method", fill="RMSLE",y="Chemical")+ theme_bw() + theme(axis.text.x=element_text(angle=45,vjust=1,hjust=1), axis.text.y=element_text(angle=45,vjust=1.05,hjust=0.9), plot.title=element_text(hjust=0.5), text=element_text(size=20))+ geom_tile(data=subset(dfmatchE3, dose_vehicle_normalized==Kvehicle2water & administration_route_normalized=="Dermal"),aes(y=PREFERRED_NAME, x=method.name), size=1.5,fill=NA,colour="red" ) ggsave(paste(getwd(),"/Figures_May1/RMSLE_compare_main.png",sep=""),width=11,height=8,units="in") ``` # Results 2: IVIVE Now, we perform *in vitro* to *in vivo* extrapolation using the dermal model to obtain some occupational hazardous levels. ## Step One: Load Data First we load [ToxCast Data](https://www.epa.gov/comptox-tools/exploring-toxcast-data) data from the [EPA-ORD-CCTE Clowder](https://clowder.edap-cluster.com/spaces/62bb560ee4b07abf29f88fef). Some of the variables in this dataset of importance include: * `hitc`: if `hitc=1` there was a statistically significant systematic conc-response observed, and if `hitc=0` there was no statistically significant * `mc6_flags`: features of the hit that might be of interest * `modl` type of conc-response model selected (using AIC) + `"cnst"`: no concentration response (constant) + `"hill"`: Hill equation +`"gnls"`: Gain-loss model, product of an increasing and decreasing Hill equations, often interpreted as being caused by cytotoxicity * `model_ga`: log10 uM concentration at which the chemical caused 50% activity (i.e., chemical-assay "potency" or "AC50") * `spid`: sample id, since each chemical might have multiple assays with hits or have been run multiple times (experimental replicates) ```{r 2_load_data, eval=run.simulations.2} # Load in vitro data load("invitrodb_3_5_mc5.Rdata") # Get chemicals of interest load_sipes2017(overwrite=F) ``` ## Step Two: Load Chemical Data ```{r 2_load_chem_data, eval=run.simulations.2} # Using list, download chem info from CompTox Dashboard for chem.list chemicals, and load. df.chem <- read.xlsx(file="CCD-Batch-Search_2023-03-08_12_23_25.xlsx", sheetName="Main Data")# %>%filter(!is.na(BOILING_POINT_DEGC_OPERA_PRED)) df.chem <- type_convert(df.chem,na="N/A") df.chem <- type_convert(df.chem,na="N/A") ``` ## Step Three: Clean Data (Only Non-Volatiles and a few of the Semi-Volatiles from the validation study) ```{r 2_clean_data, eval=run.simulations.2} #Get list of chemicals that can run in dermal model: chem.list <- get_cheminfo(info="DTXSID",species="Human",model="dermal_1subcomp") write.table(chem.list, file=paste0(loc.wd,"/tables/ChemicalList.txt"), col.names=TRUE, row.names=FALSE, quote=FALSE, sep="\t") # Take out volatile chemicals chems.from.invivo <- c("DTXSID0020232","DTXSID6022923","DTXSID8024151","DTXSID3022409","DTXSID1026081") df.chem <- df.chem %>% filter(!is.na(BOILING_POINT_DEGC_OPERA_PRED))%>% filter((BOILING_POINT_DEGC_OPERA_PRED >= 400) | (DTXSID %in% chems.from.invivo)) chem.list <- df.chem$DTXSID #get sample to test code with set.seed(100) #chem.list <- sample(chem.list,50) toxcast.dermal <- mc5 %>% subset(dsstox_substance_id %in% chem.list) ``` ## Step Four: Compute Q10.AC50 (and other parameters of interest) ```{r, 2_conc_of_interest, eval=run.simulations.2} # CONCENTRATIONS OF INTEREST - https://doi.org/10.1093/bioinformatics/btw680 ---------------- toxcast.table <- NULL old.time <- Sys.time() num.chem <- length(chem.list) for(k.chem in 1:num.chem){ #for each chemical this.id <- chem.list[k.chem] # Modulus operator if(k.chem %% 100==0){ new.time <- difftime(Sys.time(), old.time,units="secs") cat(paste0("Run ", k.chem,":",round(new.time)," secs \n")) } toxcast.dermal.chem <- subset(toxcast.dermal, dsstox_substance_id == this.id) #subset over chemical toxcast.dermal.chem.hits <- subset(toxcast.dermal.chem, hitc==1) #only look at hits if(dim(toxcast.dermal.chem.hits)[1]>0){ #if there were any hits this.row <- data.frame(Chemical = as.character(toxcast.dermal.chem.hits[1,"chnm"]), DTXSID = this.id, Total.Assays = dim(toxcast.dermal.chem)[1], Unique.Assays = length(unique(toxcast.dermal.chem$aeid)), Total.Hits = dim(toxcast.dermal.chem.hits)[1], Unique.Hits = length(unique(toxcast.dermal.chem.hits$aeid)), Low.AC50 = min(toxcast.dermal.chem.hits$modl_ga), Low.AC10 = min(toxcast.dermal.chem.hits$modl_ac10), Low.ACC = min(toxcast.dermal.chem.hits$modl_acc), Q10.AC50 = quantile(toxcast.dermal.chem.hits$modl_ga,probs=0.1), Q10.AC10 = quantile(toxcast.dermal.chem.hits$modl_ac10,probs=0.1), Q10.ACC = quantile(toxcast.dermal.chem.hits$modl_acc,probs=0.1), Med.AC50 = quantile(toxcast.dermal.chem.hits$modl_ga,probs=0.5), Med.AC10 = quantile(toxcast.dermal.chem.hits$modl_ac10,probs=0.5), Med.ACC = quantile(toxcast.dermal.chem.hits$modl_acc,probs=0.5), Q90.AC50 = quantile(toxcast.dermal.chem.hits$modl_ga,probs=0.9), Q90.AC10 = quantile(toxcast.dermal.chem.hits$modl_ac10,probs=0.9), Q90.ACC = quantile(toxcast.dermal.chem.hits$modl_acc,probs=0.9) ) toxcast.table <- rbind(toxcast.table, this.row) } } rownames(toxcast.table) <- seq(1,dim(toxcast.table)[1]) # set rownames to be sequential numbers #View table knitr::kable(head(toxcast.table[,1:6]), caption = "Summarized ToxCast Data", floating.environment="sidewaystable") write.table(toxcast.table, file=paste0(loc.wd,"/tables/ToxCastSummaryTable.txt"), col.names=TRUE, row.names=FALSE, quote=FALSE, sep="\t") ``` ## Step Five: Conduct *In Vitro* to *In Vivo* Extrapolation ```{r 2_equivalant_dose, eval=run.simulations.2} # Get Cmax = max(Cplasma) from dermal model for each chemical in toxcast table old.time <- Sys.time() include.these=which(toxcast.table$DTXSID %in% get_cheminfo(info="dtxsid", suppress.messages=T)) toxcast.table2 = subset(toxcast.table,DTXSID %in% toxcast.table$DTXSID[include.these]) atol <- rtol <- 1e-5 num.chem <- length(toxcast.table2$DTXSID) plot.each=F # Set Dermal Solutions method.ls <- c("Potts-Guy","UK-Surrey") toxcast.ls=list() for(k.chem in 1:num.chem){ this.id = toxcast.table2$DTXSID[k.chem] suppress.messages = TRUE for(k.method in 1:2){ this.method <- method.ls[k.method]; # if (this.method=="2-Compartment") this.method.input <- "NULL" this.model.type <-"dermal_1subcomp" end.time=0 parms=parameterize_dermal_pbtk(dtxsid=this.id, model.type = this.model.type, method.permeability = this.method, clint.pvalue.threshold=Inf, suppress.messages=TRUE, Kvehicle2water = 1, species="Human", default.to.human = T) parms$InfiniteDose=1 #if(end.time<2){ #make sure solver finishes out=try( solve_dermal_pbtk(parameters=parms, model.type=this.model.type, method.permeability=this.method, #Kvehicle2water=1, #vehicle is water days=5, initial.dose = 1, input.units = "ppm", dose.duration=8, dose.duration.units="hr", washoff=T, # InfiniteDose=T, atol = atol, rtol= rtol, method="lsoda", suppress.messages=suppress.messages) )#put SA for hands!!! if (is(out,"try-error")){ Cmax=0 browser() } else{end.time <- out[nrow(out),"time"] Cmax <- max(out[,"Cplasma"]) } toxcast.ls[[paste0(k.chem,".",this.method)]]=cbind(toxcast.table2[toxcast.table2$DTXSID==this.id,],Cmax=Cmax,Permeability = this.method) } } toxcast2.all=do.call(rbind,toxcast.ls) # # Calculate the EquivDose in units same as input.units (ppm) toxcast2.all$EquivDose = signif(10^toxcast2.all$Q10.AC50 / toxcast2.all$Cmax, 4) # Edit data toxcast3=toxcast2.all%>% rename(`Permeability Type` = Permeability, `Equivalent Dose`=EquivDose)%>% mutate(Chemical = ifelse(Chemical=="Methyl tert-butyl ether","MTBE",Chemical), Chemical = ifelse(Chemical=="3,3',5,5'-Tetrabromobisphenol A","Bromdian",Chemical), Chemical = ifelse(Chemical=="4,4'-Sulfonyldiphenol","Bisphenol S",Chemical), Chemical = ifelse(Chemical=="Dichloromethane","DCM",Chemical), Chemical = ifelse(Chemical=="Tetrachloroethylene","PERC/TCE",Chemical), Chemical = ifelse(Chemical=="1-Methylbenzene","Toluene",Chemical)) # Remove unused columns ivive_meade2023 <- toxcast3 %>% select(Chemical,DTXSID,`Permeability Type`,`Equivalent Dose`,Cmax) ``` ```{r add_water_solubility, eval=run.simulations.2} chem.props <- subset(chem.physical_and_invitro.data, DTXSID %in% ivive_meade2023$DTXSID)[ ,c("DTXSID","MW","logWSol")] ivive_meade2023 <- merge(ivive_meade2023, all.x = TRUE, chem.props, by="DTXSID") ivive_meade2023 <- within(ivive_meade2023, WS.ppm <- signif(MW*10^logWSol*1.001142303,4)) # Save data cgwtools::resave(ivive_meade2023, file=paste0( loc.wd, "/meade2023_", format(Sys.time(), "%b_%d_%Y"), ".Rdata")) ``` ```{r 2_load_without_running, eval = (make.plots.2 & !run.simulations.2), echo=F} # load file with ivive_meade2023 load("meade2023_Jun_20_2025.Rdata") ``` ## Step Six: Plot Results ```{r identify_potent, eval=make.plots.2} # Modify for plotting ivive_meade2023$`Permeability Type` = gsub("UK-Surrey","Surrey",ivive_meade2023$`Permeability Type`) ivive_meade2023[ivive_meade2023==Inf] = 1e15 write.table(subset(ivive_meade2023, ivive_meade2023[,"Equivalent Dose"] < ivive_meade2023[,"WS.ppm"]), file=paste0(loc.wd,"/tables/IVIVEAchievable.txt"), col.names=TRUE, row.names=FALSE, quote=FALSE, sep="\t") ivive_meade2023$`Equivalent Dose` <- log10(ivive_meade2023$`Equivalent Dose`) ivive_meade2023$`Permeability Type` = paste0("Permeability Method is ",ivive_meade2023$`Permeability Type`) write.table(ivive_meade2023, file=paste0(loc.wd,"/tables/IVIVEResults.txt"), col.names=TRUE, row.names=FALSE, quote=FALSE, sep="\t") ``` ```{r 2_plot1,fig.width=20,fig.height=12, eval=make.plots.2} # Plot Histogram data = ivive_meade2023 %>% group_by(`Permeability Type`) %>% mutate(Mean = mean(`Equivalent Dose`,na.rm=TRUE)) %>% mutate(Median = median(`Equivalent Dose`,na.rm=TRUE)) AED.labels <- c("Needs Protection (AED caused by <1ppm)", "AED Possibly Acheivable", "\"Safe\" (Unachievable AED)", "No Dermal Penetration") # Shaded rectangles with cutoff at 10,000 ppm: data.rect.1e4ppm = data.frame(xmin=c(-Inf,0,4,12), xmax=c(0,4,12,Inf), ymin=rep(-Inf,4), ymax=rep(Inf,4), Bioactivity=AED.labels) # Shaded rectangles with cutoff at 1,000,000 ppm: data.rect.1e6ppm = data.frame(xmin=c(-Inf,0,6,12), xmax=c(0,6,12,Inf), ymin=rep(-Inf,4), ymax=rep(Inf,4), Bioactivity=AED.labels) data.rect.1e4ppm$Bioactivity <- factor(data.rect.1e4ppm$Bioactivity, levels=data.rect.1e4ppm$Bioactivity) data.rect.1e6ppm$Bioactivity <- factor(data.rect.1e6ppm$Bioactivity, levels=data.rect.1e6ppm$Bioactivity) data.med = unique(data[,c("Permeability Type","Mean","Median")]) %>% pivot_longer(cols=c("Mean","Median"),names_to="Statistic",values_to="Statistic.Value") viridis.colors <- viridisLite::viridis(4) plot <- ggplot(data,aes(x=`Equivalent Dose`)) + geom_histogram(na.rm=TRUE,alpha=0.5,position="identity") + geom_freqpoly(na.rm=TRUE) + geom_vline(data = data.med, aes(xintercept=Statistic.Value,linetype=Statistic)) + geom_rect(data=data.rect.1e4ppm, aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax,fill=Bioactivity), alpha=0.5, inherit.aes=FALSE) + #scale_x_log10() + scale_fill_manual(values=setNames(viridis.colors[4:1], AED.labels))+ labs(title=paste0("Distribution of concentrations causing AED\nfor hands submerged for 8 hours a day\nfor five days"), y = "Number of Chemicals", x="Log10 of Administrered Equivalent Doses (AEDs) ppm in water") + facet_col(vars(`Permeability Type`),scales="fixed") + theme_bw() + theme(plot.title=element_text(hjust=0.5)) plot ggsave(paste(getwd(), "/Figures_May1/histogram2.png",sep=""), width = 20, height = 12, units = "cm") ``` ```{r 2_dataprep_for_plot2, fig.width=15, fig.height=10,eval=make.plots.2} # WATER SOLUBILITY ivive_meade2023 <- ivive_meade2023 %>% mutate(WSol = case_when( ((MW*10^logWSol)>=10) ~ "Water Soluble (WSol>10g/L)", ((MW*10^logWSol)<10) ~ "Not Soluble" )) #ivive_meade2023ALL <- ivive_meade2023WS; ivive_meade2023ALL$WSol = "Non-Water Soluble" ivive_meade2023WS <- ivive_meade2023 %>% filter(WSol=="Water Soluble (WSol>10g/L)") ivive_meade2023NonWS <- ivive_meade2023 %>% filter(WSol=="Not Soluble") # Different bands for soluble/non-soluble: data.rect.1e4ppm$WSol <- "Not Soluble" data.rect.1e6ppm$WSol <- "Water Soluble (WSol>10g/L)" data.rect <- rbind(data.rect.1e4ppm,data.rect.1e6ppm) data.rect[,"Permeability Type"] <- "Permeability Method is Potts-Guy" data.rect2 <- data.rect data.rect2[,"Permeability Type"] <- "Permeability Method is Surrey" data.rect <- rbind(data.rect,data.rect2) ivive_meade2023 <- rbind(ivive_meade2023NonWS,ivive_meade2023WS) data = ivive_meade2023 %>% group_by(`Permeability Type`,`WSol`) %>% mutate(Mean = mean(`Equivalent Dose`,na.rm=TRUE)) %>% mutate(Median = median(`Equivalent Dose`,na.rm=TRUE)) data.med = unique(data[,c("Permeability Type","WSol","Mean","Median")]) %>% pivot_longer(cols=c("Mean","Median"),names_to="Statistic",values_to="Statistic.Value") ``` ```{r 2_plot2, fig.width=15, fig.height=10,eval=make.plots.2} chem.to.label <- c("DTXSID0020232", # "DTXSID6022923", # "DTXSID8024151", "DTXSID3022409", "DTXSID1026081", "DTXSID9048699", "DTXSID0022858","DTXSID1037","DTXSID0023163", "DTXSID5024845" ) data.text = data %>% filter(DTXSID %in% chem.to.label) #PLOT WS plot <- ggplot(data) + geom_rect(data=data.rect, aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax,fill=Bioactivity), alpha=0.5#, #inherit.aes=FALSE ) + geom_histogram(na.rm=TRUE,alpha=0.5,position="identity",aes(x=`Equivalent Dose`)) + geom_freqpoly(na.rm=TRUE,aes(x=`Equivalent Dose`)) + geom_vline(data = data.med, aes(xintercept=Statistic.Value,linetype=Statistic)) + # geom_point(data=data.text,aes(x=`Equivalent Dose`,y=0),size=2) + # geom_text_repel(data=data.text,aes(x=`Equivalent Dose`,y=0,label=Chemical), # max.overlaps=14, # hjust = "right",nudge_x=-1,nudge_y=0.2)+ scale_fill_manual(values=setNames(viridis.colors[4:1], AED.labels))+ #ylim(0,135)+ #geom_text(x=) + labs(title=paste0("Distribution of concentrations causing AED\nfor hands submerged for 8 hours a day\nfor five days"), y = "Number of Chemicals", x="Log10 of Needed Concentration (ppm) in water") + facet_wrap(vars(`Permeability Type`,`WSol`),scales="free_y",ncol=2) + theme_bw() + theme(plot.title=element_text(hjust=0.5), text=element_text(size=20)) plot ggsave(paste(getwd(), "/Figures_May1/histogram_all_main.png",sep=""), width = 15, height = 10, units = "in") ``` Include a table with the equivalent doses, grouped by permeability method ```{r finallist2, eval=FALSE} data2=data[order(data$`Equivalent Dose`),] data2$`Equivalent Dose`= exp(data2$`Equivalent Dose`) data3=data.frame(data2%>%group_by(Chemical,DTXSID, `Permeability Type`,WSol,logWSol)%>% mutate(row=row_number())%>% ungroup() %>% pivot_wider(id_cols=c(Chemical, DTXSID, row, WSol,logWSol), names_from = `Permeability Type`, values_from = c(`Equivalent Dose`), names_sep="")) data4=data3[,-3] data4=data4[order(data4$Permeability.Method.is.Surrey),] ``` Also, calculate permeability for each method for humans ```{r finallist, eval=FALSE} pmat=list() pm=c("UK-Surrey","Potts-Guy") kw=c("water","octanol") for(i in 1:length(supptab1_meade2023$DTXSID)){ for(m in 1:2){ #each method for(k in 1:2){ # each vehicle p=parameterize_dermal_pbtk(dtxsid=supptab1_meade2023$DTXSID[i], method.permeability =pm[m] ,species="Human", Kvehicle2water = kw[k]) pmat[[paste(i,".",m,".",k)]]=cbind(p$P, pm[m], kw[k],p$MW, supptab1_meade2023$DTXSID[i]) } } } pmat2=do.call(rbind,pmat) colnames(pmat2)=c("Perm", "Method","Vehicle","MW","DTXSID") pmat2=data.frame(pmat2) pmat2$Method=gsub("UK-Surrey","Surrey",pmat2$Method) pmat2$Vehicle=paste("Vehicle is", pmat2$Vehicle) pmat3=merge(pmat2,supptab1_meade2023, by="DTXSID") ggplot(pmat3,aes(x=as.numeric(MW.x),y=as.numeric(Perm)))+ geom_point()+ facet_grid(Method~Vehicle,scales="free")+ labs(x="MW",y="Kp - Permeability") ggsave(paste(getwd(), "/Figures_May1/MWvPerm.png",sep=""), width = 4, height = 3.5, units = "in") ggplot(pmat3,aes(x=as.numeric(logP),y=as.numeric(Perm)))+ geom_point()+ facet_grid(Method~Vehicle,scales="free")+ labs(x="log(Kow)",y="Kp- Permeability") ggsave(paste(getwd(), "/Figures_May1/logKowvPerm.png",sep=""), width = 4, height = 3.5, units = "in") ``` ```{r save_workspace, eval=make.plots.2} save.image(paste0(loc.wd, "MeadeWorkspace", Sys.Date(), ".RData")) ```