--- title: "Spatial Data Quality Control on Protein data" author: "Dario Righelli" date: "`r BiocStyle::doc_date()`" output: BiocStyle::html_document: toc: true toc_float: true vignette: > %\VignetteIndexEntry{Spatial Data Quality Control on Protein data} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} # Set chunk options: suppress echo, messages, and warnings in code output knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE) ``` # Introduction This vignette introduces the `SpaceTrooper` package for spatial data analysis from platforms like **CosMx** on Protein assay. # Installation To install `SpaceTrooper`, use the following commands: ```{r install, eval=FALSE} # Install BiocManager if not already installed, then install SpaceTrooper if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("drighelli/SpaceTrooper") ``` # Data Loading In this section, we load data from various platforms using the package's functions. The goal is to provide a uniform `SpatialExperiment` object across all technologies, allowing for consistent QC analysis. The functions in `SpaceTrooper` compute missing metrics as needed and allow for the inclusion of polygons with the `keep_polygons` argument. This stores polygons in the `colData` of the `SpatialExperiment`. ```{r load-data} # Load the SpaceTrooper library library(SpaceTrooper) # Load Xenium data into a Spatial Experiment object (SPE) protfolder <- system.file( "extdata", "S0_prot", package="SpaceTrooper") (spe <- readCosmxProteinSPE(protfolder)) colData(spe) ``` # Data Analysis for CosMx protein The package offers several functions for spatial data analysis, including quality control and visualization. This tutorial focuses on CosMx protein data, which provides Fields of View (FoVs) with cell identifiers. Note that FoVs are unique to CosMx. Additionally, even if not tested, the same approach can be extended on Akoya CODEX data, as far as a `SpatialExperiment` object is created. Polygons can be loaded later if needed. # Field of Views (FOVs) Visualization The `plotCellsFovs` function shows a map of the FoVs within an experiment. This plot is specific to CosMx data and uses cell centroids. Please keep in mind, that this specific experiment had unaligned `fov_positions` and cell centroids positions. An alignment approach, can be found at the end of the `scripts/datacreation.R` file. ```{r plot-fovs} # Plot the cells within their respective Field of Views (FOVs) plotCellsFovs(spe) ``` Because the dataset is a subset of just one Field of View of the original experiment, we are able to see the identifier of the FoV in black and the centroids of the cells in purple. When an experiment has multiple FoVs, you can see the map and the topological organization of the FoVs, together with their identifiers. # Quality control The `spatialPerCellQC` function, inspired by `scater::addPerCellQC`, computes additional metrics for each cell in the `SpatialExperiment`. It also allows for the detection of negative control probes, which is crucial for QC. By default, it automatically removes 0 counts cells, but this can be handled with the `rmZeros` argument. Here, for transparency, we specified the `negProbList` for CosMx protein assays, but the algorithm has already a set of negative probes for the mostly used probes in multiple technologies. Notice that despite the same approach can be applied to CODEX data, it is not provided a list of negative probes for this technology, so the user needs to specify them. ```{r cosmx-analysis-qc} # Perform per-cell quality control checks spe <- spatialPerCellQC(spe, negProbList=c("Ms IgG1", "Rb IgG")) names(colData(spe)) ``` # Metrics Histograms You can investigate individual metrics by viewing their histograms. For outliers, use the `use_fences` argument to display the fences computed by `computeSpatialOutlier` (see next chunk). ```{r plot-metrics} # Plot a histogram of counts (sum) plotMetricHist(spe, metric="sum") # Plot a histogram of cell areas (Area_um) plotMetricHist(spe, metric="Area_um") # Plot a histogram of proportion of counts respect to the cell area in micron plotMetricHist(spe, metric="log2CountArea") # Plot a histogram of proportion of negative probe counts respect to the total # counts in cells plotMetricHist(spe, metric="log2Ctrl_total_ratio") ``` These plots show, respectively, the distributions of the total counts (`sum`), of the Area in micron (`Area_um`), the relationship between the counts and the Area of each cell (`log2CountArea`) and the proportion between the negative probes counts and the total counts of each cell (`log2Ctrl_total_ratio`). # Spatial Outlier Detection Spatial outlier detection is another critical step in QC. While the flag score addresses some metrics, other outlier detection methods may be needed. The `computeSpatialOutlier` function allows the computation of the medcouple statistics on a specified metric (`compute_by` argument). The medcouple is specifically designed for symmetric distributions, indeed the function stamps a warning message when this requisite is not satisfied. It can also use `scuttle::isOutlier` for asymmetric distributions. The `method` argument supports `mc`, `scuttle`, or `both`. This outlier detection approach can be used to decide if and which cells can be discarded on a singular metric. ```{r cosmx-analysis-outlier, warning=FALSE} # Identify spatial outliers based on cell area (Area_um) spe <- computeSpatialOutlier(spe, computeBy="Area_um", method="both") # Identify spatial outliers based on mean DAPI intensity spe <- computeSpatialOutlier(spe, computeBy="Mean.DAPI", method="both") names(colData(spe)) ``` If we computed outliers with the `computeSpatialOutlier` function, we can also visualize which fences have been used to create the filter on the cells. ```{r plot-metrics-2} # Plot a histogram with fences to identify outliers using the medcouple plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_mc") # Plot a histogram with fences to identify outliers using scuttle plotMetricHist(spe, metric="Area_um", useFences="Area_um_outlier_sc") # Plot a histogram with fences to identify outliers using the medcouple plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_mc") # Plot a histogram with fences to identify outliers using scuttle plotMetricHist(spe, metric="Mean.DAPI", useFences="Mean.DAPI_outlier_sc") ``` We visualize the fences computed with medcouple and scuttle outlier detection approaches, to directly inspect differences and the amount of detected outlier each method detected. If we want, we can already use these fences to remove the computed outliers. # The Quality Score Next, we use `computeQScore` to calculate a flag score based on previously computed metrics. The flag score combines **transcript counts** related to **cell area**, the **aspect ratio** of each cell, and its **distance from the FoV border** (only for CosMx, this last one is not used otherwise). See the `help(computeQScore)` details section for additional details. ```{r cosmx-analysis-score} # Calculate quality scores for each cell spe <- computeQScore(spe) names(colData(spe)) ``` Logical filters can then be computed using `computeQScoreFlags`, which requires thresholds for various metrics. Currently, the function considers: * **Flag Score (`qs_threshold`)**: Cells with scores below this threshold (default 0.5) are flagged for exclusion. This value can be used to indicate the quantile for the filtering when setting the `use_fs_quantiles` argument to `TRUE`. * **Flag Score Quantiles (`use_qs_quantiles`)**: Option to filter based on quantiles (default `FALSE`). ```{r cosmx-analysis-score2} # Compute flags to identify cells for filtering spe <- computeQScoreFlags(spe, qsThreshold=0.5) names(colData(spe)) table(spe$low_qscore) ``` We detected 61 cells to be removed. # Additional metrics to filter out cells While for other metrics such as the total counts and the negative prob ratio, the function `computeThresholdFlags` considers: * **Total Counts (`total_threshold`)**: Minimum count threshold (default 0). * **Negative Probe Ratio (`ctrl_tot_ratio_threshold`)**: Minimum ratio of negative probes to total counts (default 0.1). ```{r thresholds-outliers} spe <- computeThresholdFlags(spe, totalThreshold=0, ctrlTotRatioThreshold=0.1) table(spe$threshold_flags) ``` In this example, we don't find any cell to be removed. # Adding Polygon and Visualization To better understand the quality score values we start to load the polygons, giving us a better overview of the cells characteristics. We can load and add polygons to the SPE object using the following functions. Each technology has its own `readPolygons` function to standardize the loaded `sf` object and handle different file types. ```{r plot-polygons} # Read polygon data associated with cells in the SPE # the polygon file path is stored in the spe metadata pols <- readPolygonsCosmx(metadata(spe)$polygons) # Add the polygon data to the SPE object spe <- addPolygonsToSPE(spe, pols) ``` Once the polygons are stored in an `sf` object within `colData`, they can be visualized using functions based on the `ggplot2` library. ```{r plot-polygons-fov-1} # Plot the polygons of the selected cells plotPolygons(spe, bgColor="white") ``` Showing the cells on a white background for better visualization. ```{r plot-polygons-fov-2} # Plot polygons colored by cell area plotPolygons(spe, colourBy="log2AspectRatio") plotPolygons(spe, colourBy="Area_um") ``` We can see in `yellow` and `darkviolet` that there are few cells with extreme values of `log2AspectRatio` and `Area:um` in micron. ```{r plot-polygons-fov-3} plotPolygons(spe, colourBy="quality_score") plotPolygons(spe, colourBy="low_qscore") ``` We can see that the quality score is able to detect both these aspects and highlight the cells that are mostly isolated on the FoV border or showing a weird confomation. We always recommend to be aware of the cell populations in the under-study context, before proceeding to remove the detected cells. # Fov Zoom and Map The `plotZoomFovsMap` function allows you to visualize a map of the FoVs with a zoom-in of selected FoVs, colored by the `colour_by` argument. ```{r plot-zoom-fov} plotZoomFovsMap(spe, fovs=60, colourBy="quality_score") plotZoomFovsMap(spe, fovs=60, colourBy="low_qscore") ``` We see on the left side the map of all the FoVs (only the FoV 16 in this case), together with the poligons on the right, coloured by the quality score. Allowing us to have a better view of a specific tissue area in the whole experiment. # Conclusion In this vignette, we explored the main functionalities of the `SpaceTrooper` package for spatial data analysis. Main steps shown are: * data and polygons loading for CosMx Protein, CosMx * quality control: + outlier detection: medcouple and scuttle MAD + flag score: a score combining **transcript counts**, **cell area**, **aspect ratio** and **distance from the FoV border** * visualization: + centroids: with ggplot2 + polygons: sf + ggplot2 # Session Information ```{r sessionInfo} sessionInfo() ```