library(BiocStyle)
library(HPAanalyze)
library(dplyr)
library(xml2)
The Human Protein Atlas allow you to download very detailed data for each protein in the form of an xml file, and hpaXmlGet
and hpaXml
allow you to retrieve those files automatically from HPA server and parse them. However, due to technical limitation, you will not be able to save those "xml_document"/"xml_node"
objects. The question is: How do you keep a version of these files to use when you are not connected to the internet, or for reproducibility?
Look at the “Downloadable data” page from HPA website, you will see how these files are downloaded. Basically, you add [ensembl_id].xml
to http://www.proteinatlas.org
to download individual entries (that’s what hpaXmlGet
does behind the scene), or download the whole big set.
From there, you can import the file using xml2::read_xml()
. The output should be exactly the same as hpaXmlGet
.
## same as hpaXmlGet("ENSG00000134057")
CCNB1xml <- xml2::read_xml("data/ENSG00000134057.xml")
hpaXml
functionsSince the umbrella function hpaXml
take either the ensembl id or the imported xml_document
object, you can feed what you just imported to it and get the expected result.
CCNB1_parsed <- hpaXml(CCNB1xml)
You can obviously use other hpaXml
functions as well.
hpaXmlProtClass(CCNB1xml)
hpaXmlTissueExprSum(CCNB1xml)
hpaXmlAntibody(CCNB1xml)
hpaXmlTissueExpr(CCNB1xml)
It is recommended that you save your parsed objects for reproducibility. Unlike the xml_document
object, these parsed objects are just regular R lists of standard vectors or data frames. You can save them just as usual.
saveRDS(CCNB1_parsed, "data/CCNB1_parsed.rds")
Anh Tran, 2018-2020
Please cite: Tran, A.N., Dussaq, A.M., Kennell, T. et al. HPAanalyze: an R package that facilitates the retrieval and analysis of the Human Protein Atlas data. BMC Bioinformatics 20, 463 (2019) https://doi.org/10.1186/s12859-019-3059-z