CytoMethIC-Oncology is a collection machine learning models for oncology. This includes CNS tumor classification, pan-cancer classification, cell of origin classification, and subtype classification models.
Models available are listed below:
| EHID | ModelID | PredictionLabel |
|---|---|---|
| EH8423 | CancerCellOfOrigin21_rfc | Cell of origin defined in TCGA (N=21) |
| NA | CancerType33_InfHum3_20230807 | TCGA cancer types (N=33) |
| EH8398 | CancerType33_mlp | TCGA cancer types (N=33) |
| EH8395 | CancerType33_rfc | TCGA cancer types (N=33) |
| NA | CancerType33_rfcTCGA_InfHum3 | TCGA cancer types (N=33) |
| EH8396 | CancerType33_svm | TCGA cancer types (N=33) |
| EH8397 | CancerType33_xgb | TCGA cancer types (N=33) |
| NA | CancerType33_xgbTCGA_InfHum3 | TCGA cancer types (N=33) |
| EH8402 | CNSTumor66_mlp | CNS Tumor Class (N=66) |
| EH8399 | CNSTumor66_rfc | CNS Tumor Class (N=66) |
| NA | CNSTumor66_rfcCapper_InfHum3 | CNS Tumor Class (N=66) |
| EH8400 | CNSTumor66_svm | CNS Tumor Class (N=66) |
| EH8401 | CNSTumor66_xgb | CNS Tumor Class (N=66) |
| NA | CNSTumor66_xgbCapper_InfHum3 | CNS Tumor Class (N=66) |
| EH8422 | Subtype91_rfc | Cancer subtypes defined in TCGA (N=91) |
| NA | TumorPurity_HM450 | Tumor purity (%) |
| NA | TumorPurity_HM450_20240318 | Tumor purity (%) |
One can access the model using the EHID above in ExperimentHub()[["EHID"]].
More models (if EHID is NA) are available in the following Github Repo. You can directly download them and load with readRDS(). Some examples using either approach are below.
The below snippet shows a demonstration of the model abstraction working on random forest and support vector models from CytoMethIC models on ExperimentHub.
## for missing data
library(sesame)
library(CytoMethIC)
betas = imputeBetas(sesameDataGet("HM450.1.TCGA.PAAD")$betas)
model = ExperimentHub()[["EH8395"]] # Random forest model
cmi_predict(betas, model)## $response
## [1] "PAAD"
##
## $prob
## PAAD
## 0.852
## $response
## [1] "PAAD"
##
## $prob
## betas[, attr(model$terms, "term.labels")]
## 0.9864795
model = ExperimentHub()[["EH8422"]] # Cancer subtype
cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, model)## $response
## [1] "GI.CIN"
##
## $prob
## GI.CIN
## 0.462
The below snippet shows a demonstration of the cmi_predict function working to predict the cell of origin of the cancer.
## $response
## [1] "C20:Mixed (Stromal/Immune)"
##
## $prob
## C20:Mixed (Stromal/Immune)
## 0.768
## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] sesame_1.27.0 sesameData_1.27.0 CytoMethIC_1.5.0
## [4] ExperimentHub_2.17.0 AnnotationHub_3.17.0 BiocFileCache_2.17.0
## [7] dbplyr_2.5.0 BiocGenerics_0.55.0 generics_0.1.3
## [10] knitr_1.50
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4
## [3] blob_1.2.4 filelock_1.0.3
## [5] Biostrings_2.77.0 fastmap_1.2.0
## [7] digest_0.6.37 mime_0.13
## [9] lifecycle_1.0.4 KEGGREST_1.49.0
## [11] RSQLite_2.3.9 magrittr_2.0.3
## [13] compiler_4.5.0 rlang_1.1.6
## [15] sass_0.4.10 tools_4.5.0
## [17] yaml_2.3.10 S4Arrays_1.9.0
## [19] bit_4.6.0 curl_6.2.2
## [21] DelayedArray_0.35.0 plyr_1.8.9
## [23] RColorBrewer_1.1-3 abind_1.4-8
## [25] BiocParallel_1.43.0 withr_3.0.2
## [27] purrr_1.0.4 grid_4.5.0
## [29] stats4_4.5.0 preprocessCore_1.71.0
## [31] wheatmap_0.2.0 e1071_1.7-16
## [33] colorspace_2.1-1 ggplot2_3.5.2
## [35] scales_1.3.0 SummarizedExperiment_1.39.0
## [37] cli_3.6.4 rmarkdown_2.29
## [39] crayon_1.5.3 reshape2_1.4.4
## [41] httr_1.4.7 tzdb_0.5.0
## [43] proxy_0.4-27 DBI_1.2.3
## [45] cachem_1.1.0 stringr_1.5.1
## [47] parallel_4.5.0 AnnotationDbi_1.71.0
## [49] BiocManager_1.30.25 XVector_0.49.0
## [51] matrixStats_1.5.0 vctrs_0.6.5
## [53] Matrix_1.7-3 jsonlite_2.0.0
## [55] IRanges_2.43.0 hms_1.1.3
## [57] S4Vectors_0.47.0 bit64_4.6.0-1
## [59] fontawesome_0.5.3 jquerylib_0.1.4
## [61] glue_1.8.0 codetools_0.2-20
## [63] stringi_1.8.7 gtable_0.3.6
## [65] BiocVersion_3.22.0 GenomeInfoDb_1.45.0
## [67] GenomicRanges_1.61.0 UCSC.utils_1.5.0
## [69] munsell_0.5.1 tibble_3.2.1
## [71] pillar_1.10.2 rappdirs_0.3.3
## [73] htmltools_0.5.8.1 randomForest_4.7-1.2
## [75] GenomeInfoDbData_1.2.14 R6_2.6.1
## [77] evaluate_1.0.3 Biobase_2.69.0
## [79] lattice_0.22-7 readr_2.1.5
## [81] png_0.1-8 memoise_2.0.1
## [83] BiocStyle_2.37.0 bslib_0.9.0
## [85] class_7.3-23 Rcpp_1.0.14
## [87] SparseArray_1.9.0 xfun_0.52
## [89] MatrixGenerics_1.21.0 pkgconfig_2.0.3