--- title: "Introduction to SegmentedCells" date: "`r BiocStyle::doc_date()`" author: - name: Nicolas Canete affiliation: - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia email: nicolas.canete@sydney.edu.au - name: Ellis Patrick affiliation: - &WIMR Westmead Institute for Medical Research, University of Sydney, Australia - School of Mathematics and Statistics, University of Sydney, Australia email: ellis.patrick@sydney.edu.au package: "`r BiocStyle::pkg_ver('spicyR')`" vignette: > %\VignetteIndexEntry{"Introduction to SegmentedCells"} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(BiocStyle) ``` # Installation ```{r, eval = FALSE} if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("spicyR") ``` # Overview A `SegmentedCells` is an object designed to store data from imaging cytometry (FISH, IMC, CycIF, spatial transcriptomics, ... ) that has already been segmented and reduced to individual cells. A `SegmentedCells` extends DataFrame and defines methods that take advantage of DataFrame nesting to represent various elements of cell-based experiments with spatial orientation that are commonly encountered. This object is able to store information on a cell's spatial location, cellType, morphology, intensity of gene/protein marks as well as image level phenotype information. Ideally this type of data can be used for cell clustering, point process models or nearest neighbour analysis. Below we will consider a few examples of data formats that can be transformed into a `SegmentedCells`. First, load the `spicyR` package. ```{r setup, message=FALSE} library(spicyR) library(S4Vectors) ``` # Example 1 - Data resembles cellProfiler output Here we create a `SegmentedCells` from data that was output from cellProfiler or similar programs. This assumes that there are columns with the string *AreaShape_* and *Intensity_Mean* and that there are `ObjectNumber` and `ImageNumber` columns. Here we create toy cellProfiler data. ```{r} ### Something that resembles cellProfiler data set.seed(51773) n = 10 cells <- data.frame(row.names = seq_len(n)) cells$ObjectNumber <- seq_len(n) cells$ImageNumber <- rep(1:2,c(n/2,n/2)) cells$AreaShape_Center_X <- runif(n) cells$AreaShape_Center_Y <- runif(n) cells$AreaShape_round <- rexp(n) cells$AreaShape_diameter <- rexp(n, 2) cells$Intensity_Mean_CD8 <- rexp(n, 10) cells$Intensity_Mean_CD4 <- rexp(n, 10) ``` We can then create a `SegmentedCells` object. ```{r} cellExp <- SegmentedCells(cells, cellProfiler = TRUE) cellExp ``` Extract the cellSummary information and overwrite it as well. ```{r} cellSum <- cellSummary(cellExp) head(cellSum) cellSummary(cellExp) <- cellSum ``` We can then set the cell types of each cell by extracting and clustering marker intensity information. ```{r} markers <- cellMarks(cellExp) kM <- kmeans(markers,2) cellType(cellExp) <- paste('cluster',kM$cluster, sep = '') cellSum <- cellSummary(cellExp) head(cellSum) ``` # Example 2 - Three pancreatic islets from from Damond et al (2019) Read in data. ```{r} isletFile <- system.file("extdata","isletCells.txt.gz", package = "spicyR") cells <- read.table(isletFile, header = TRUE) ``` We can then create a `SegmentedCells` object. ```{r} cellExp <- SegmentedCells(cells, cellProfiler = TRUE) cellExp ``` We can then set the cell types of each cell by extracting and clustering marker intensity information. ```{r} markers <- cellMarks(cellExp) kM <- kmeans(markers,4) cellType(cellExp) <- paste('cluster',kM$cluster, sep = '') cellSum <- cellSummary(cellExp) head(cellSum) ``` Here is a very simple plot in ggplot showing the spatial distribution of the cell types ```{r, fig.width=5, fig.height= 6} plot(cellExp, imageID=1) ``` # Example 3 - Custom markerintensity and morphology column names Here we create toy data that has a slightly more fluid naming stucture. ```{r} set.seed(51773) n = 10 cells <- data.frame(row.names = seq_len(n)) cells$cellID <- seq_len(n) cells$imageCellID <- rep(seq_len(n/2),2) cells$imageID <- rep(1:2,c(n/2,n/2)) cells$x <- runif(n) cells$y <- runif(n) cells$shape_round <- rexp(n) cells$shape_diameter <- rexp(n, 2) cells$intensity_CD8 <- rexp(n, 10) cells$intensity_CD4 <- rexp(n, 10) cells$cellType <- paste('cluster',sample(1:2,n,replace = TRUE), sep = '_') ``` We can then create a `SegmentedCells` object. ```{r} cellExp <- SegmentedCells(cells, cellTypeString = 'cellType', intensityString = 'intensity_', morphologyString = 'shape_') cellExp ``` Extract morphology information ```{r} morph <- cellMorph(cellExp) head(morph) ``` ## Phenotype information We can also include phenotype information for each image. Create some corresponding toy phenotype information which must have a `imageID` variable. ```{r} phenoData <- DataFrame(imageID = c('1','2'), age = c(21,81), status = c('dead','alive')) imagePheno(cellExp) <- phenoData imagePheno(cellExp) imagePheno(cellExp, expand = TRUE) ``` # Example 4 - Minimal example, cells only have spatial coordinates Here we generate data where we only know the location of each cell. ```{r} set.seed(51773) n = 10 cells <- data.frame(row.names = seq_len(n)) cells$x <- runif(n) cells$y <- runif(n) cellExp <- SegmentedCells(cells) cellExp ``` Extract the cellSummary information which now also has cellIDs and imageIDs. ```{r} cellSum <- cellSummary(cellExp) head(cellSum) ``` # sessionInfo() ```{r} sessionInfo() ```