--- title: "Basic R-Usage Guide for rMZQC" author: "Chris Bielow " date: '`r Sys.Date()`' output: html_document: default pdf_document: null vignette: > %\VignetteIndexEntry{Basic R-Usage Guide for rMZQC} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- # Basic R-Usage Guide for rMZQC This vignette serves as a quickstart guide for R users to create and save an mzQC document. **Target Audience:** R users ## Read an mzQC file and extract some data ```{r, eval=TRUE} library(rmzqc) data = readMZQC(system.file("./testdata/test.mzQC", package = "rmzqc", mustWork = TRUE)) cat("This file has ", length(data$runQualities), " runqualities\n") cat(" - file: ", data$runQualities[[1]]$metadata$inputFiles[[1]]$name, "\n") cat(" - # of metrics: ", length(data$runQualities[[1]]$qualityMetrics), "\n") cat(" - metric #1 name: ", data$runQualities[[1]]$qualityMetrics[[1]]$name, "\n") cat(" - metric #1 value: ", data$runQualities[[1]]$qualityMetrics[[1]]$value, "\n") ``` Hint: if you receive an error such as ``` Error in parse_con(txt, bigint_as_char) : lexical error: invalid char in json text. cursor_int": [ NaN,NaN,NaN,NaN,825282.0,308263 (right here) ------^ ``` when calling`readMZQC` this indicates that the mzQC is not valid JSON, since `NaN` values should be quoted (`"NaN"`) or replaced by `null` (unquoted), depending on context. In short: `null` may become an `NA` in R if part of an array, see https://github.com/jeroen/jsonlite/issues/70#issuecomment-431433773. ## Create a minimal mzQC document ```{r, eval=TRUE} library(rmzqc) ## we need a proper URI (i.e. no backslashes and a scheme, e.g. 'file:') ## otherwise writing will fail raw_file = localFileToURI("c:\\data\\special.raw", FALSE) file_format = getCVTemplate(accession = filenameToCV(raw_file)) ptxqc_software = toAnalysisSoftware(id = "MS:1003162", version = "1.0.13") ## you could use 'version = packageVersion("PTXQC")' to automate further run1_qc = MzQCrunQuality$new(metadata = MzQCmetadata$new(label = raw_file, inputFiles = list(MzQCinputFile$new(basename(raw_file), raw_file, file_format)), analysisSoftware = list(ptxqc_software)), qualityMetrics = list(toQCMetric(id = "MS:4000059", value = 13405), ## number of MS1 scans toQCMetric(id = "MS:4000063", value = list("MS:1000041" = 1:3, "UO:0000191" = c(0.1, 0.8, 0.1))) # MS2 known precursor charges fractions ) ) mzQC_document = MzQCmzQC$new(version = "1.0.0", creationDate = MzQCDateTime$new(), contactName = Sys.info()["user"], contactAddress = "test@user.info", description = "A minimal mzQC test document with bogus data", runQualities = list(run1_qc), setQualities = list(), controlledVocabularies = list(getCVInfo())) ## write it out mzqc_filename = paste0(getwd(), "/test.mzQC") writeMZQC(mzqc_filename, mzQC_document) cat(mzqc_filename, "written to disk!\n") ## read it again mq = readMZQC(mzqc_filename) ## print some basic stats gettextf("This mzQC was created on %s and has %d quality metric(s) in total.", dQuote(mq$creationDate$datetime), length(mq$runQualities) + length(mq$setQualities)) ``` ## Different Value Types (single value, n-tuple, table, matrix) for QualityMetric's mzQC allows for 4 different kinds of value types for a `qualityMetric`: - Single value - n-tuple - table - matrix See `toQCMetric()` for more examples. Below we will create an example for each of them, so you know what R datastructures to employ. Using a data.frame, instead of a list, will give you the wrong JSON formatting and leads to validation failures. ### Single Value A single value (MS:4000003) is the easiest. Just assign a string or a number to the value attribute: ```{r, eval=TRUE} qualityMetric_singleValue = toQCMetric(id = "MS:4000059", value = 13405) ## number of MS1 scans ## let's look at how this looks in JSON jsonlite::toJSON(qualityMetric_singleValue, pretty = TRUE, auto_unbox = TRUE) ``` ### n-tuple An n-tuple (MS:4000004) is just a vector in R: ```{r, eval=TRUE} qualityMetric_tuple = toQCMetric(id = "MS:4000061", value = c(0.45, 0.76, 0.23)) ## MS1 density quantiles ## let's look at how this looks in JSON jsonlite::toJSON(qualityMetric_tuple, pretty = TRUE, auto_unbox = TRUE) ``` ### Table An table (MS:4000005) is a list (not a data.frame!) of columns. The example below shows a table with two columns and three rows: ```{r, eval=TRUE} qualityMetric_table = # MS2 known precursor charges fractions toQCMetric(id = "MS:4000063", value = list("MS:1000041" = 1:3, ## charge state "UO:0000191" = c(0.1, 0.8, 0.1))) ## fraction of precursors with that charge ## let's look at how this looks in JSON jsonlite::toJSON(qualityMetric_table, pretty = TRUE, auto_unbox = TRUE) ``` ### Matrix A matrix (MS:4000006) is an R matrix: At the point of writing, there is no matrix CV term yet. So we just make one up: ```{r, eval=TRUE} qualityMetric_matrix = toQCMetric(id = "MS:40000??", value = matrix(1:9, 3, 3), allow_unknown_id = TRUE) qualityMetric_matrix$name = "unknown metric" ## let's look at how this looks in JSON jsonlite::toJSON(qualityMetric_matrix, pretty = TRUE, auto_unbox = TRUE) ```