Original Authors: Martin Morgan, Sonali Arora, Lori Shepherd
Presenting Author: Martin Morgan
Date: 20 June, 2022
Back: Monday labs
Objective: Gain confidence working with base R commands and data structures.
Lessons learned:
factor(), NA?factorbrowseVignettes()Efficient vectorized calculations on ‘atomic’ vectors logical,
integer, numeric, complex, character, raw
character_vector <- c("January", "February", "March", "April", "May")
logical_vector <- c(FALSE, FALSE, TRUE, TRUE, TRUE)
integer_vector <- 1:5  # c(1, 2, 3, 4, 5)Atomic vectors are building blocks for more complicated objects
factor – enumeration of possible levels
months <- factor(
    character_vector,     # values realized in 'months'
    levels = c(           # possible values
        "January", "February", "March", "April", "May", "June", "July",
        "August", "September", "October", "November", "December"
    )
)matrix – atomic vector with ‘dim’ attribute
matrix(1:6, nrow = 3)  # n.b., 'column-major' order##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6data.frame – list of equal length atomic vectors
data_frame <- data.frame(
    month = months,
    is_spring = logical_vector,
    month_of_year = integer_vector
)Formal classes represent complicated combinations of vectors,
e.g., the return value of lm(), below
Functions transform inputs to outputs, perhaps with side effects
rnorm(5)## [1]  4.679075 -1.225512  1.333134 -1.623254  1.260870Argument matching first by name, then by position
Functions may define (some) arguments to have default values
log(1:5)            # default base = exp(1)## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379log(1:5, base = 10)## [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700log(base = 10, 1:5) # named arguments match before unnamed## [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700Generic functions dispatch to specific methods based on class of
argument(s), e.g., print().
Methods are functions that implement specific generics, e.g.,
print.factor; methods are invoked indirectly, via the generic.
?print        # what does the generic 'print()' do?
?print.factor # what does the method 'print(x)', when x is a factor, do?Many but not all functions able to manipulate a particular class are
methods, e.g., abline() used below is a plain-old-function.
Iteration:
lapply()
args(lapply)## function (X, FUN, ...) 
## NULLX (an atomic vector or list()), apply a
function FUN to each vector element, returning the result as a
list. ... are additional arguments to FUN.FUN can be built-in, or a user-defined functionlst <- list(a=1:2, b=2:4)
lapply(lst, log)      # 'base' argument default; natural log## $a
## [1] 0.0000000 0.6931472
## 
## $b
## [1] 0.6931472 1.0986123 1.3862944lapply(lst, log, 10)  # '10' is second argument to 'log()', i.e., log base 10## $a
## [1] 0.00000 0.30103
## 
## $b
## [1] 0.3010300 0.4771213 0.6020600sapply() – like lapply(), but simplify the result to a
vector, matrix, or array, if possible.vapply() – like sapply(), but requires that the return
type of FUN is specified; this can be safer – an error when
the result is of an unexpected type.mapply() (also Map())
args(mapply)## function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) 
## NULL... are one or more vectors, recycled to be of the same
length. FUN is a function that takes as many arguments as
there are components of .... mapply returns the result of
applying FUN to the elements of the vectors in ....
mapply(seq, 1:3, 4:6, SIMPLIFY=FALSE) # seq(1, 4); seq(2, 5); seq(3, 6)## [[1]]
## [1] 1 2 3 4
## 
## [[2]]
## [1] 2 3 4 5
## 
## [[3]]
## [1] 3 4 5 6apply()
args(apply)## function (X, MARGIN, FUN, ..., simplify = TRUE) 
## NULLFor a matrix or array X, apply FUN to each MARGIN
(dimension, e.g., MARGIN=1 means apply FUN to each row,
MARGIN=2 means apply FUN to each column)
Traditional iteration programming constructs repeat {}, for () {}
lapply() !Conditional
if (test) {
    ## code if TEST == TRUE
} else {
    ## code if TEST == FALSE
}Functions (see table below for a few favorites)
fun <- function(x) {
    length(unique(x))
}
## list of length 5, each containsing a sample (with replacement) of letters
lets <- replicate(5, sample(letters, 50, TRUE), simplify=FALSE)
sapply(lets, fun)## [1] 24 22 20 21 22Introspection
class(), str()dim()Help
?"print": help on the generic print?"print.data.frame": help on print method for objects of class
data.frame.help(package="GenomeInfoDb")browseVignettes("GenomicRanges")methods("plot")methods(class="lm")The following code chunk illustrates R vectors, vectorized
operations, objects (e.g., data.frame()), formulas, functions,
generics (plot) and methods (plot.formula), class and method
discovery (introspection).
x <- rnorm(1000)                     # atomic vectors
y <- x + rnorm(1000, sd=.5)          # vectorized computation
df <- data.frame(x=x, y=y)           # object of class 'data.frame'
plot(y ~ x, df)                      # generic plot, method plot.formula
fit <- lm(y ~x, df)                  # object of class 'lm'
anova(fit)                           # see help with ?anova.lm## Analysis of Variance Table
## 
## Response: y
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## x           1 948.42  948.42  3923.7 < 2.2e-16 ***
## Residuals 998 241.23    0.24                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1plot(y ~ x, df)                      # methods(plot); ?plot.formula
abline(fit, col="red", lwd=3, lty=2) # a function, not generic.methodUse methods() for introspection 9calss and method discovery), e.g.,
methods(class=class(fit))            # introspection##  [1] add1           alias          anova          case.names     coerce        
##  [6] confint        cooks.distance deviance       dfbeta         dfbetas       
## [11] drop1          dummy.coef     effects        extractAIC     family        
## [16] formula        hatvalues      influence      initialize     kappa         
## [21] labels         logLik         model.frame    model.matrix   nobs          
## [26] plot           predict        print          proj           qr            
## [31] residuals      rstandard      rstudent       show           simulate      
## [36] slotsFromS3    summary        variable.names vcov          
## see '?methods' for accessing help and source codeProgramming example – group 1000 gene SYMBOLs into GO identifiers
The file ‘symgo.csv’ is from an Excel spreadsheet (exported as ‘csv’
– comma-separated value – format) with four columns – the gene
‘SYMBOL’ (e.g., SOX17), the gene ontology (GO) term(s) that the
symbol has been associated with, and additional gene ontology information.
## example data
fl <- file.choose()      ## symgo.csvsymgo <- read.csv(fl, row.names=1, stringsAsFactors=FALSE)
head(symgo)##      SYMBOL         GO EVIDENCE ONTOLOGY
## 1   PPIAP28       <NA>     <NA>     <NA>
## 2     PTLAH       <NA>     <NA>     <NA>
## 3 HIST1H2BC GO:0000786      NAS       CC
## 4 HIST1H2BC GO:0000788      IBA       CC
## 5 HIST1H2BC GO:0002227      IDA       BP
## 6 HIST1H2BC GO:0003677      IBA       MFdim(symgo)## [1] 5041    4length(unique(symgo$SYMBOL))## [1] 1000head(symgo[symgo$SYMBOL == "SOX17",])##      SYMBOL         GO EVIDENCE ONTOLOGY
## 4576  SOX17 GO:0000122      IEA       BP
## 4577  SOX17 GO:0001525      ISS       BP
## 4578  SOX17 GO:0001570      ISS       BP
## 4579  SOX17 GO:0001706      IDA       BP
## 4580  SOX17 GO:0001828      IEA       BP
## 4581  SOX17 GO:0001947      ISS       BPHow many gene SYMBOLs are associated with each GO term? There are several ways to calculate this…
## split + length
go2sym <- split(symgo$SYMBOL, symgo$GO)
len1 <- lengths(go2sym)
head(len1)## GO:0000049 GO:0000050 GO:0000060 GO:0000077 GO:0000086 GO:0000118 
##          3          2          1          1          3          1## smarter built-in functions, e.g., omiting NAs
len2 <- aggregate(SYMBOL ~ GO, symgo, length)
head(len1)## GO:0000049 GO:0000050 GO:0000060 GO:0000077 GO:0000086 GO:0000118 
##          3          2          1          1          3          1In aggregate(), the third argument is FUN. The value of FUN is
the function that is applied to each group defined by the formula of
the first argument. Provide a ‘custom’ function that uses the unique
lower-case values
## your own function -- unique, lower-case identifiers
uidfun  <- function(x)
    unique(tolower(x))This illustrates how one is not restricted to ‘built-in’ solutions for solving biological problems.
head(aggregate(SYMBOL ~ GO , symgo, uidfun))##           GO                SYMBOL
## 1 GO:0000049         yars2, eef1a1
## 2 GO:0000050                   asl
## 3 GO:0000060                 oprd1
## 4 GO:0000077                 pea15
## 5 GO:0000086 tubb4a, cenpf, clasp1
## 6 GO:0000118                  cir1These case studies serve as refreshers on R input and manipulation of data.
Input a file that contains ALL (acute lymphoblastic leukemia) patient information
fname <- file.choose()   ## "ALLphenoData.tsv"
stopifnot(file.exists(fname))
pdata <- read.delim(fname)Check out the help page ?read.delim for input options, and explore
basic properties of the object you’ve created, for instance…
class(pdata)## [1] "data.frame"colnames(pdata)##  [1] "id"             "diagnosis"      "sex"            "age"           
##  [5] "BT"             "remission"      "CR"             "date.cr"       
##  [9] "t.4.11."        "t.9.22."        "cyto.normal"    "citog"         
## [13] "mol.biol"       "fusion.protein" "mdr"            "kinet"         
## [17] "ccr"            "relapse"        "transplant"     "f.u"           
## [21] "date.last.seen"dim(pdata)## [1] 127  21head(pdata)##     id diagnosis sex age BT remission CR   date.cr t.4.11. t.9.22. cyto.normal
## 1 1005 5/21/1997   M  53 B2        CR CR  8/6/1997   FALSE    TRUE       FALSE
## 2 1010 3/29/2000   M  19 B2        CR CR 6/27/2000   FALSE   FALSE       FALSE
## 3 3002 6/24/1998   F  52 B4        CR CR 8/17/1998      NA      NA          NA
## 4 4006 7/17/1997   M  38 B1        CR CR  9/8/1997    TRUE   FALSE       FALSE
## 5 4007 7/22/1997   M  57 B2        CR CR 9/17/1997   FALSE   FALSE       FALSE
## 6 4008 7/30/1997   M  17 B1        CR CR 9/27/1997   FALSE   FALSE       FALSE
##          citog mol.biol fusion.protein mdr   kinet   ccr relapse transplant
## 1      t(9;22)  BCR/ABL           p210 NEG dyploid FALSE   FALSE       TRUE
## 2  simple alt.      NEG           <NA> POS dyploid FALSE    TRUE      FALSE
## 3         <NA>  BCR/ABL           p190 NEG dyploid FALSE    TRUE      FALSE
## 4      t(4;11) ALL1/AF4           <NA> NEG dyploid FALSE    TRUE      FALSE
## 5      del(6q)      NEG           <NA> NEG dyploid FALSE    TRUE      FALSE
## 6 complex alt.      NEG           <NA> NEG hyperd. FALSE    TRUE      FALSE
##                 f.u date.last.seen
## 1 BMT / DEATH IN CR           <NA>
## 2               REL      8/28/2000
## 3               REL     10/15/1999
## 4               REL      1/23/1998
## 5               REL      11/4/1997
## 6               REL     12/15/1997summary(pdata$sex)##    Length     Class      Mode 
##       127 character charactersummary(pdata$cyto.normal)##    Mode   FALSE    TRUE    NA's 
## logical      69      24      34Remind yourselves about various ways to subset and access columns of a data.frame
pdata[1:5, 3:4]##   sex age
## 1   M  53
## 2   M  19
## 3   F  52
## 4   M  38
## 5   M  57pdata[1:5, ]##     id diagnosis sex age BT remission CR   date.cr t.4.11. t.9.22. cyto.normal
## 1 1005 5/21/1997   M  53 B2        CR CR  8/6/1997   FALSE    TRUE       FALSE
## 2 1010 3/29/2000   M  19 B2        CR CR 6/27/2000   FALSE   FALSE       FALSE
## 3 3002 6/24/1998   F  52 B4        CR CR 8/17/1998      NA      NA          NA
## 4 4006 7/17/1997   M  38 B1        CR CR  9/8/1997    TRUE   FALSE       FALSE
## 5 4007 7/22/1997   M  57 B2        CR CR 9/17/1997   FALSE   FALSE       FALSE
##         citog mol.biol fusion.protein mdr   kinet   ccr relapse transplant
## 1     t(9;22)  BCR/ABL           p210 NEG dyploid FALSE   FALSE       TRUE
## 2 simple alt.      NEG           <NA> POS dyploid FALSE    TRUE      FALSE
## 3        <NA>  BCR/ABL           p190 NEG dyploid FALSE    TRUE      FALSE
## 4     t(4;11) ALL1/AF4           <NA> NEG dyploid FALSE    TRUE      FALSE
## 5     del(6q)      NEG           <NA> NEG dyploid FALSE    TRUE      FALSE
##                 f.u date.last.seen
## 1 BMT / DEATH IN CR           <NA>
## 2               REL      8/28/2000
## 3               REL     10/15/1999
## 4               REL      1/23/1998
## 5               REL      11/4/1997head(pdata[, 3:5])##   sex age BT
## 1   M  53 B2
## 2   M  19 B2
## 3   F  52 B4
## 4   M  38 B1
## 5   M  57 B2
## 6   M  17 B1tail(pdata[, 3:5], 3)##     sex age BT
## 125   M  19 T2
## 126   M  30 T3
## 127   M  29 T2head(pdata$age)## [1] 53 19 52 38 57 17head(pdata$sex)## [1] "M" "M" "F" "M" "M" "M"head(pdata[pdata$age > 21,])##      id diagnosis sex age BT remission CR   date.cr t.4.11. t.9.22. cyto.normal
## 1  1005 5/21/1997   M  53 B2        CR CR  8/6/1997   FALSE    TRUE       FALSE
## 3  3002 6/24/1998   F  52 B4        CR CR 8/17/1998      NA      NA          NA
## 4  4006 7/17/1997   M  38 B1        CR CR  9/8/1997    TRUE   FALSE       FALSE
## 5  4007 7/22/1997   M  57 B2        CR CR 9/17/1997   FALSE   FALSE       FALSE
## 10 8001 1/15/1997   M  40 B2        CR CR 3/26/1997   FALSE   FALSE       FALSE
## 11 8011 8/21/1998   M  33 B3        CR CR 10/8/1998   FALSE   FALSE       FALSE
##           citog mol.biol fusion.protein mdr   kinet   ccr relapse transplant
## 1       t(9;22)  BCR/ABL           p210 NEG dyploid FALSE   FALSE       TRUE
## 3          <NA>  BCR/ABL           p190 NEG dyploid FALSE    TRUE      FALSE
## 4       t(4;11) ALL1/AF4           <NA> NEG dyploid FALSE    TRUE      FALSE
## 5       del(6q)      NEG           <NA> NEG dyploid FALSE    TRUE      FALSE
## 10     del(p15)  BCR/ABL           p190 NEG    <NA> FALSE    TRUE      FALSE
## 11 del(p15/p16)  BCR/ABL      p190/p210 NEG dyploid FALSE   FALSE       TRUE
##                  f.u date.last.seen
## 1  BMT / DEATH IN CR           <NA>
## 3                REL     10/15/1999
## 4                REL      1/23/1998
## 5                REL      11/4/1997
## 10               REL      7/11/1997
## 11 BMT / DEATH IN CR           <NA>It seems from below that there are 17 females over 40 in the data set,
but when sub-setting pdata to contain just those individuals 19 rows
are selected. Why? What can we do to correct this?
idx <- pdata$sex == "F" & pdata$age > 40
table(idx)## idx
## FALSE  TRUE 
##   108    17dim(pdata[idx,])## [1] 19 21Use the mol.biol column to subset the data to contain just
individuals with ‘BCR/ABL’ or ‘NEG’, e.g.,
bcrabl <- pdata[pdata$mol.biol %in% c("BCR/ABL", "NEG"),]The mol.biol column is a factor, and retains all levels even after
subsetting. How might you drop the unused factor levels?
bcrabl$mol.biol <- factor(bcrabl$mol.biol)The BT column is a factor describing B- and T-cell subtypes
levels(bcrabl$BT)## NULLHow might one collapse B1, B2, … to a single type B, and likewise for T1, T2, …, so there are only two subtypes, B and T
table(bcrabl$BT)## 
##  B B1 B2 B3 B4  T T1 T2 T3 T4 
##  4  9 35 22  9  4  1 15  9  2levels(bcrabl$BT) <- substring(levels(bcrabl$BT), 1, 1)
table(bcrabl$BT)## 
##  B B1 B2 B3 B4  T T1 T2 T3 T4 
##  4  9 35 22  9  4  1 15  9  2Use xtabs() (cross-tabulation) to count the number of samples with
B- and T-cell types in each of the BCR/ABL and NEG groups
xtabs(~ BT + mol.biol, bcrabl)##     mol.biol
## BT   BCR/ABL NEG
##   B        2   2
##   B1       1   8
##   B2      19  16
##   B3       8  14
##   B4       7   2
##   T        0   4
##   T1       0   1
##   T2       0  15
##   T3       0   9
##   T4       0   2Use aggregate() to calculate the average age of males and females in
the BCR/ABL and NEG treatment groups.
aggregate(age ~ mol.biol + sex, bcrabl, mean)##   mol.biol sex      age
## 1  BCR/ABL   F 39.93750
## 2      NEG   F 30.42105
## 3  BCR/ABL   M 40.50000
## 4      NEG   M 27.21154Use t.test() to compare the age of individuals in the BCR/ABL versus
NEG groups; visualize the results using boxplot(). In both cases,
use the formula interface. Consult the help page ?t.test and re-do
the test assuming that variance of ages in the two groups is
identical. What parts of the test output change?
t.test(age ~ mol.biol, bcrabl)## 
##  Welch Two Sample t-test
## 
## data:  age by mol.biol
## t = 4.8172, df = 68.529, p-value = 8.401e-06
## alternative hypothesis: true difference in means between group BCR/ABL and group NEG is not equal to 0
## 95 percent confidence interval:
##   7.13507 17.22408
## sample estimates:
## mean in group BCR/ABL     mean in group NEG 
##              40.25000              28.07042boxplot(age ~ mol.biol, bcrabl)This case study is a second walk through basic data manipulation and visualization skills. We use data from the US Center for Disease Control’s Behavioral Risk Factor Surveillance System (BRFSS) annual survey. Check out the web page for a little more information. We are using a small subset of this data, including a random sample of 10000 observations from each of 1990 and 2010.
Input the data using read.csv(), creating a variable brfss to hold
it. Use file.choose() to locate the data file BRFSS-subset.csv
fname <- file.choose()   ## BRFSS-subset.csv
stopifnot(file.exists(fname))
brfss <- read.csv(fname)Base plotting functions
Explore the data using class(), dim(), head(), summary(),
etc. Use xtabs() to summarize the number of males and females in
the study, in each of the two years.
Use aggregate() to summarize the average weight in each sex and
year.
Create a scatterplot showing the relationship between the square
root of weight and height, using the plot() function and the
main argument to annotate the plot. Note the transformed
Y-axis. Experiment with different plotting symbols (try the command
example(points) to view different points).
plot(sqrt(Weight) ~ Height, brfss, main="All Years, Both Sexes")Color the female and male points differently. To do this, use the
col argument to plot(). Provide as a value to that argument a
vector of colors, subset by brfss$Sex.
Create a subset of the data containing only observations from
brfss2010 <- brfss[brfss$Year == "2010", ]Create the figure below (two panels in a single figure). Do this by
using the par() function with the mfcol argument before calling
plot(). You’ll need to create two more subsets of data, perhaps
when you are providing the data to the function plot.
opar <- par(mfcol=c(1, 2))
plot(sqrt(Weight) ~ Height, brfss2010[brfss2010$Sex == "Female", ],
     main="2010, Female")
plot(sqrt(Weight) ~ Height, brfss2010[brfss2010$Sex == "Male", ],
     main="2010, Male")par(opar)                           # reset 'par' to original valuePlotting large numbers of points means that they are often
over-plotted, potentially obscuring important patterns. Experiment
with arguments to plot() to address over-plotting, e.g.,
pch='.' or alpha=.4. Try using the smoothScatter() function
(the data have to be presented as x and y, rather than as a
formula). Try adding the hexbin library to your R session
(using library()) and creating a hexbinplot().
ggplot2 graphics
Create a scatterplot showing the relationship between the square root of weight and height, using the ggplot2 library, and the annotate the plot. Two equivalent ways to create the plot are show in the solution.
library(ggplot2)
## 'quick' plot
qplot(Height, sqrt(Weight), data=brfss)## Warning: Removed 735 rows containing missing values (geom_point).## specify the data set and 'aesthetics', then how to plot
ggplot(brfss, aes(x=Height, y=sqrt(Weight))) +
    geom_point()## Warning: Removed 735 rows containing missing values (geom_point).
qplot() gives us a warning which states that it has removed rows
containing missing values. This is actually very helpful because we
find out that our dataset contains NA’s and we can take a design
decision here about what we’d like to do these NA’s. We can find
the indicies of the rows containing NA using is.na(), and count
the number of rows with NA values using sum():
sum(is.na(brfss$Height))## [1] 184sum(is.na(brfss$Weight))## [1] 649drop <- is.na(brfss$Height) | is.na(brfss$Weight)
sum(drop)## [1] 735Remove the rows which contain NA’s in Height and Weight.
brfss <- brfss[!drop,]Plot is annotated with
qplot(Height, sqrt(Weight), data=brfss) +
    ylab("Square root of Weight") + 
        ggtitle("All Years, Both Sexes")Color the female and male points differently.
ggplot(brfss, aes(x=Height, y=sqrt(Weight), color=Sex)) + 
    geom_point()
One can also change the shape of the points for the female and male
groups
ggplot(brfss, aes(x=Height, y = sqrt(Weight), color=Sex, shape=Sex)) + 
    geom_point()
or plot Male and Female in different panels using 
facet_grid()
ggplot(brfss, aes(x=Height, y = sqrt(Weight), color=Sex)) + 
    geom_point() +
        facet_grid(Sex ~ .)Create a subset of the data containing only observations from 2010
and make density curves for male and female groups. Use the fill
aesthetic to indicate that each sex is to be calculated separately,
and geom_density() for the density plot.
brfss2010 <- brfss[brfss$Year == "2010", ]
ggplot(brfss2010, aes(x=sqrt(Weight), fill=Sex)) +
    geom_density(alpha=.25)Plotting large numbers of points means that they are often over-plotted, potentially obscuring important patterns. Make the points semi-transparent using alpha. Here we make them 60% transparent. The solution illustrates a nice feature of ggplot2 – a partially specified plot can be assigned to a variable, and the variable modified at a later point.
sp <- ggplot(brfss, aes(x=Height, y=sqrt(Weight)))
sp + geom_point(alpha=.4)Add a fitted regression model to the scatter plot.
sp + geom_point() + stat_smooth(method=lm)## `geom_smooth()` using formula 'y ~ x'
By default, 
stat_smooth() also adds a 95% confidence region for
the regression fit. The confidence interval can be changed by
setting level, or it can be disabled with se=FALSE.
sp + geom_point() + stat_smooth(method=lm + level=0.95)
sp + geom_point() + stat_smooth(method=lm, se=FALSE)How do you fit a linear regression line for each group? First we’ll make the base plot object sps, then we’ll add the linear regression lines to it.
sps <- ggplot(brfss, aes(x=Height, y=sqrt(Weight), colour=Sex)) +
    geom_point() +
        scale_colour_brewer(palette="Set1")
sps + geom_smooth(method="lm")## `geom_smooth()` using formula 'y ~ x'sessionInfo()## R version 4.2.0 (2022-04-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur/Monterey 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.6    BiocStyle_2.24.0
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.2    xfun_0.31           bslib_0.3.1        
##  [4] purrr_0.3.4         splines_4.2.0       lattice_0.20-45    
##  [7] colorspace_2.0-3    vctrs_0.4.1         generics_0.1.2     
## [10] htmltools_0.5.2     yaml_2.3.5          mgcv_1.8-40        
## [13] utf8_1.2.2          rlang_1.0.2         jquerylib_0.1.4    
## [16] pillar_1.7.0        glue_1.6.2          withr_2.5.0        
## [19] DBI_1.1.2           RColorBrewer_1.1-3  lifecycle_1.0.1    
## [22] stringr_1.4.0       munsell_0.5.0       gtable_0.3.0       
## [25] codetools_0.2-18    evaluate_0.15       labeling_0.4.2     
## [28] knitr_1.39          fastmap_1.1.0       fansi_1.0.3        
## [31] highr_0.9           Rcpp_1.0.8.3        scales_1.2.0       
## [34] BiocManager_1.30.18 magick_2.7.3        jsonlite_1.8.0     
## [37] farver_2.1.0        digest_0.6.29       stringi_1.7.6      
## [40] bookdown_0.27       dplyr_1.0.9         grid_4.2.0         
## [43] cli_3.3.0           tools_4.2.0         magrittr_2.0.3     
## [46] sass_0.4.1          tibble_3.1.7        crayon_1.5.1       
## [49] pkgconfig_2.0.3     Matrix_1.4-1        ellipsis_0.3.2     
## [52] assertthat_0.2.1    rmarkdown_2.14      R6_2.5.1           
## [55] nlme_3.1-157        compiler_4.2.0Research reported in this tutorial was supported by the National Human Genome Research Institute and the National Cancer Institute of the National Institutes of Health under award numbers U24HG004059 (Bioconductor), U24HG010263 (AnVIL) and U24CA180996 (ITCR).