Downloads
The GWAS results are available for download in two main formats:
- Per-phenotype flat files: for most analyses of one or a few phenotypes, we suggest using the per-phenotype flat files, available freely on Amazon AWS. More information on the file formats is available in the Technical Details.
- The phenotype manifest (browse on Google Sheets or download on Amazon AWS) contains the location and detailed information of all per-phenotype files for those phenotypes for which GWAS was run.
- The variant manifest contains detailed information on each variant (download on Amazon AWS, tbi).
- Please note that p-values are now stored as natural log p-values to avoid underflow (i.e., ln P, not -ln P or -log10 P).
- Hail format: For large-scale analyses of many phenotypes, we provide the full dataset in Hail MatrixTable format on Google Cloud.
- Please note that the previous iteration of release files have been archived at
s3://pan-ukb-us-east-1/archive_20200615/
. - All data are now additionally hosted on the Genomics Data Lake - Azure Open Datasets.
In addition, the LD matrices and scores are available in the following formats:
- LDSC-compatible flat files: for running LD score regression, we suggest using the LD score flat files available on Amazon AWS (download the tarball file here). More information on the file formats is available on the LDSC website.
- Hail format: For large-scale analyses, we provide the full LD matrices and scores in Hail format on Amazon AWS.
All heritability estimates (see here for more information on our approach) are available for download in the following formats:
- Flat files: the manifest flat file is available on AWS (tarball here) or on Google Sheets. Our topline results can be found as part of the main phenotype manifest (Amazon AWS or Google Sheets)
- Hail format: for large-scale analyses and integration with our other datasets we provide heritability data in Hail format on Google Cloud Platform.
The phenotype correlation matrix, used in the construction of the maximally independent set of phenotypes passing QC (see here for details), is available on Amazon AWS (download the tarball file here).
The ancestry assignments (as well as corresponding principal components and covariates used in our analyses) are available for download through the UK Biobank portal as Return 2442. These are available to researchers registered with the UK Biobank: refer to instructions within the AMS portal to download these results.
Terms
All data here are released openly and publicly for the benefit of the wider biomedical community. You can freely download and search the data, and we encourage the use and publication of results generated from these data. From the perspective of the Pan-UKB team, there are absolutely no restrictions or embargoes on the publication of results derived from this data. However, we note that this research has been conducted using the UK Biobank Resource (project ID 31063), and use of this data is bound by all terms of usage of the UK Biobank: more information about can be found here. All users of this data agree to not attempt to reidentify participants.
These data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose (see license information below). This dataset has been subjected to quality control, but variant calling and statistical methods to associate variants and phenotypes is an imperfect and probabilistic process, so many errors no doubt remain: if you find any glaring errors, feel free to contact us. Users of the dataset certify that they are in compliance with all applicable local, state, and federal laws or regulations and institutional policies regarding human subjects and genetics research.
The GWAS results data produced by the Pan-UKB are available free of restrictions under the Creative Commons Attribution 4.0 International (CC BY 4.0). We request that you acknowledge and give attribution to both the Pan-UKB project and UK Biobank, and link back to the relevant page, wherever possible.
Citation
In addition to acknowledging the UK Biobank, we request that any use of this dataset in publications cite:
Karczewski, Gupta, Kanai et al., "Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects". medRxiv. 2024 Mar 15. doi: 10.1101/2024.03.13.24303864
There is no need to include us as authors on your manuscript, unless we contributed specific advice or analysis for your work.