GENESIS provides statistical methodology for analyzing genetic data from samples with population structure and/or familial relatedness. This vignette provides a description of how to use GENESIS for inferring population structure, as well as estimating relatedness measures such as kinship coefficients, identity by descent (IBD) sharing probabilities, and inbreeding coefficients. GENESIS uses PC-AiR for population structure inference that is robust to known or cryptic relatedness, and it uses PC-Relate for accurate relatedness estimation in the presence of population structure, admixutre, and departures from Hardy-Weinberg equilibrium.
The functions in the GENESIS package read genotype data from a GenotypeData class object as created by the GWASTools package. Through the use of GWASTools, a GenotypeData class object can easily be created from:
Example R code for creating a GenotypeData object is presented below. Much more detail can be found in the GWASTools package reference manual.
geno <- MatrixGenotypeReader(genotype = genotype, snpID = snpID, chromosome = chromosome, 
                             position = position, scanID = scanID)
genoData <- GenotypeData(geno)genotype is a matrix of genotype values coded as 0 / 1 / 2, where rows index SNPs and columns index samplessnpID is an integer vector of unique SNP IDschromosome is an integer vector specifying the chromosome of each SNPposition is an integer vector specifying the position of each SNPscanID is a vector of unique individual IDsfilename is the file path to the GDS objectThe SNPRelate package provides the snpgdsBED2GDS function to convert binary PLINK files into a GDS file.
snpgdsBED2GDS(bed.fn = "genotype.bed", bim.fn = "genotype.bim", fam.fn = "genotype.fam", 
              out.gdsfn = "genotype.gds")bed.fn is the file path to the PLINK .bed filebim.fn is the file path to the PLINK .bim filefam.fn is the file path to the PLINK .fam fileout.gdsfn is the file path for the output GDS fileOnce the PLINK files have been converted to a GDS file, then a GenotypeData object can be created as described above.
To demonstrate PC-AiR and PC-Relate analyses with the GENESIS package, we analyze SNP data from the Mexican Americans in Los Angeles, California (MXL) and African American individuals in the southwestern USA (ASW) population samples of HapMap 3. Mexican Americans and African Americans have a diverse ancestral background, and familial relatives are present in these data. Genotype data at a subset of 20K autosomal SNPs for 173 individuals are provided as a GDS file.
# read in GDS data
gdsfile <- system.file("extdata", "HapMap_ASW_MXL_geno.gds", package="GENESIS")
HapMap_geno <- GdsGenotypeReader(filename = gdsfile)
# create a GenotypeData class object
HapMap_genoData <- GenotypeData(HapMap_geno)
HapMap_genoData## An object of class GenotypeData 
##  | data:
## File: /tmp/RtmpSJrQVL/Rinst2522ed3f97b/GENESIS/extdata/HapMap_ASW_MXL_geno.gds (901.8K)
## +    [  ] *
## |--+ sample.id   { Int32,factor 173 ZIP(40.9%), 283B } *
## |--+ snp.id   { Int32 20000 ZIP(34.6%), 27.1K }
## |--+ snp.position   { Int32 20000 ZIP(34.6%), 27.1K }
## |--+ snp.chromosome   { Int32 20000 ZIP(0.13%), 103B }
## \--+ genotype   { Bit2 20000x173, 844.7K } *
##  | SNP Annotation:
## NULL
##  | Scan Annotation:
## NULLConomos M.P., Reiner A.P., Weir B.S., & Thornton T.A. (2016). Model-free Estimation of Recent Genetic Relatedness. American Journal of Human Genetics, 98(1), 127-148.
Conomos M.P., Miller M.B., & Thornton T.A. (2015). Robust Inference of Population Structure for Ancestry Prediction and Correction of Stratification in the Presence of Relatedness. Genetic Epidemiology, 39(4), 276-293.
Gogarten, S.M., Bhangale, T., Conomos, M.P., Laurie, C.A., McHugh, C.P., Painter, I., … & Laurie, C.C. (2012). GWASTools: an R/Bioconductor package for quality control and analysis of Genome-Wide Association Studies. Bioinformatics, 28(24), 3329-3331.
Manichaikul, A., Mychaleckyj, J.C., Rich, S.S., Daly, K., Sale, M., & Chen, W.M. (2010). Robust relationship inference in genome-wide association studies. Bioinformatics, 26(22), 2867-2873.