library(MungeSumstats)MungeSumstats now offers high throughput query and import functionality to data from the MRC IEU Open GWAS Project.
This is made possible by the use the
IEU OpwnGWAS R package: ieugwasr.
Before you can use this functionality however, please complete the following steps:
To authenticate, you need to generate a token from the OpenGWAS website. The
token behaves like a password, and it will be used to authorise the requests
you make to the OpenGWAS API. Here are the steps to generate the token and then
have ieugwasr automatically use it for your queries:
OPENGWAS_JWT=<token> to your .Renviron file, thi can be edited in R by
running usethis::edit_r_environ()ieugwasr::get_opengwas_jwt(). If it returns a long random string then you are
authenticated.ieugwasr::user(). It will make a
request to the API for your user information using your token. It should return
a list with your user information. If it returns an error, then your token is
not working.We can search by terms and with other filters like sample size:
#### Search for datasets ####
metagwas <- MungeSumstats::find_sumstats(traits = c("parkinson","alzheimer"), 
                                         min_sample_size = 1000)
head(metagwas,3)
ids <- (dplyr::arrange(metagwas, nsnp))$id  ##          id               trait group_name year    author
## 1 ieu-a-298 Alzheimer's disease     public 2013   Lambert
## 2   ieu-b-2 Alzheimer's disease     public 2019 Kunkle BW
## 3 ieu-a-297 Alzheimer's disease     public 2013   Lambert
##                                                                                                                                                                                                                                                                                                                    consortium
## 1                                                                                                                                                                                                                                                                                                                        IGAP
## 2 Alzheimer Disease Genetics Consortium (ADGC), European Alzheimer's Disease Initiative (EADI), Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium (CHARGE), Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for Alzheimer's Disease Consortium (GERAD/PERADES),
## 3                                                                                                                                                                                                                                                                                                                        IGAP
##                 sex population     unit     nsnp sample_size       build
## 1 Males and Females   European log odds    11633       74046 HG19/GRCh37
## 2 Males and Females   European       NA 10528610       63926 HG19/GRCh37
## 3 Males and Females   European log odds  7055882       54162 HG19/GRCh37
##   category                subcategory ontology mr priority     pmid sd
## 1  Disease Psychiatric / neurological       NA  1        1 24162737 NA
## 2   Binary Psychiatric / neurological       NA  1        0 30820047 NA
## 3  Disease Psychiatric / neurological       NA  1        2 24162737 NA
##                                                                      note ncase
## 1 Exposure only; Effect allele frequencies are missing; forward(+) strand 25580
## 2                                                                      NA 21982
## 3                Effect allele frequencies are missing; forward(+) strand 17008
##   ncontrol     N
## 1    48466 74046
## 2    41944 63926
## 3    37154 54162You can also search by ID:
### By ID and sample size
metagwas <- find_sumstats(
  ids = c("ieu-b-4760", "prot-a-1725", "prot-a-664"),
  min_sample_size = 5000
)You can supply import_sumstats() with a list of as many OpenGWAS IDs as you
want, but we’ll just give one to save time.
datasets <- MungeSumstats::import_sumstats(ids = "ieu-a-298",
                                           ref_genome = "GRCH37")By default, import_sumstats results a named list where the names are the Open
GWAS dataset IDs and the items are the respective paths to the formatted summary
statistics.
print(datasets)## $`ieu-a-298`
## [1] "/tmp/Rtmph2Nn7U/ieu-a-298.tsv.gz"You can easily turn this into a data.frame as well.
results_df <- data.frame(id=names(datasets), 
                         path=unlist(datasets))
print(results_df)##                  id                             path
## ieu-a-298 ieu-a-298 /tmp/Rtmph2Nn7U/ieu-a-298.tsv.gzOptional: Speed up with multi-threaded download via axel.
datasets <- MungeSumstats::import_sumstats(ids = ids, 
                                           vcf_download = TRUE, 
                                           download_method = "axel", 
                                           nThread = max(2,future::availableCores()-2))See the Getting started vignette for more information on how to use MungeSumstats and its functionality.
utils::sessionInfo()## R version 4.5.1 Patched (2025-08-23 r88802)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] MungeSumstats_1.18.0 BiocStyle_2.38.0    
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1                           
##  [2] dplyr_1.1.4                                
##  [3] blob_1.2.4                                 
##  [4] R.utils_2.13.0                             
##  [5] Biostrings_2.78.0                          
##  [6] bitops_1.0-9                               
##  [7] fastmap_1.2.0                              
##  [8] RCurl_1.98-1.17                            
##  [9] VariantAnnotation_1.56.0                   
## [10] GenomicAlignments_1.46.0                   
## [11] XML_3.99-0.19                              
## [12] digest_0.6.37                              
## [13] lifecycle_1.0.4                            
## [14] KEGGREST_1.50.0                            
## [15] RSQLite_2.4.3                              
## [16] magrittr_2.0.4                             
## [17] compiler_4.5.1                             
## [18] rlang_1.1.6                                
## [19] sass_0.4.10                                
## [20] tools_4.5.1                                
## [21] yaml_2.3.10                                
## [22] data.table_1.17.8                          
## [23] rtracklayer_1.70.0                         
## [24] knitr_1.50                                 
## [25] S4Arrays_1.10.0                            
## [26] bit_4.6.0                                  
## [27] curl_7.0.0                                 
## [28] DelayedArray_0.36.0                        
## [29] ieugwasr_1.1.0                             
## [30] abind_1.4-8                                
## [31] BiocParallel_1.44.0                        
## [32] BiocGenerics_0.56.0                        
## [33] R.oo_1.27.1                                
## [34] grid_4.5.1                                 
## [35] stats4_4.5.1                               
## [36] SummarizedExperiment_1.40.0                
## [37] cli_3.6.5                                  
## [38] rmarkdown_2.30                             
## [39] crayon_1.5.3                               
## [40] generics_0.1.4                             
## [41] BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1
## [42] httr_1.4.7                                 
## [43] rjson_0.2.23                               
## [44] DBI_1.2.3                                  
## [45] cachem_1.1.0                               
## [46] stringr_1.5.2                              
## [47] parallel_4.5.1                             
## [48] AnnotationDbi_1.72.0                       
## [49] BiocManager_1.30.26                        
## [50] XVector_0.50.0                             
## [51] restfulr_0.0.16                            
## [52] matrixStats_1.5.0                          
## [53] vctrs_0.6.5                                
## [54] Matrix_1.7-4                               
## [55] jsonlite_2.0.0                             
## [56] bookdown_0.45                              
## [57] IRanges_2.44.0                             
## [58] S4Vectors_0.48.0                           
## [59] bit64_4.6.0-1                              
## [60] GenomicFiles_1.46.0                        
## [61] GenomicFeatures_1.62.0                     
## [62] jquerylib_0.1.4                            
## [63] glue_1.8.0                                 
## [64] codetools_0.2-20                           
## [65] stringi_1.8.7                              
## [66] GenomeInfoDb_1.46.0                        
## [67] BiocIO_1.20.0                              
## [68] GenomicRanges_1.62.0                       
## [69] UCSC.utils_1.6.0                           
## [70] tibble_3.3.0                               
## [71] pillar_1.11.1                              
## [72] SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.24   
## [73] htmltools_0.5.8.1                          
## [74] Seqinfo_1.0.0                              
## [75] BSgenome_1.78.0                            
## [76] R6_2.6.1                                   
## [77] evaluate_1.0.5                             
## [78] lattice_0.22-7                             
## [79] Biobase_2.70.0                             
## [80] R.methodsS3_1.8.2                          
## [81] png_0.1-8                                  
## [82] Rsamtools_2.26.0                           
## [83] cigarillo_1.0.0                            
## [84] memoise_2.0.1                              
## [85] bslib_0.9.0                                
## [86] SparseArray_1.10.0                         
## [87] xfun_0.53                                  
## [88] MatrixGenerics_1.22.0                      
## [89] pkgconfig_2.0.3