Fixes to heritability estimates and adjustments to summary statisics schema

We have recently updated our summary statistic schema and our heritability estimates:

  1. We have identified a bug in the computation of heritability point estimates involving use of improper allele frequencies. We have resolved the issue and accordingly recomputed all biobank-wide heritability analyses using all methods. These updated results can be found in the updated manifests.
  2. With new heritability estimates, we have now recomputed QC for all summary statistics. This has resulted in a largely overlapping (but non-identical) set of 1091 QC-pass phenotype-ancestry pairs, with this new set used for QC-pass meta-analysis. Per-phenotype, per-ancestry association statistics remain identical.
  3. We have recomputed the maximally independent set using the same approach as described previously, now incorporating the updated set of QC-pass phenotypes.
  4. We have updated our summary statistics schema to now include clearly labeled -log10 p-values rather than ln p-values as previous. More information on the updated schema can be found on the per-phenotype files page. Archived summary statistics using the previous schema can be found at updated paths listed in the archived sheet of the phenotype manifest.

These changes should help improve the clarity and quality of our released data.

Fixes to heritability estimates and adjustments to summary statisics schema

We have recently updated our summary statistic schema and our heritability estimates:

  1. We have identified a bug in the computation of heritability point estimates involving use of improper allele frequencies. We have resolved the issue and accordingly recomputed all biobank-wide heritability analyses using all methods. These updated results can be found in the updated manifests.
  2. With new heritability estimates, we have now recomputed QC for all summary statistics. This has resulted in a largely overlapping (but non-identical) set of 1091 QC-pass phenotype-ancestry pairs, with this new set used for QC-pass meta-analysis. Per-phenotype, per-ancestry association statistics remain identical.
  3. We have recomputed the maximally independent set using the same approach as described previously, now incorporating the updated set of QC-pass phenotypes.
  4. We have updated our summary statistics schema to now include clearly labeled -log10 p-values rather than ln p-values as previous. More information on the updated schema can be found on the per-phenotype files page. Archived summary statistics using the previous schema can be found at updated paths listed in the archived sheet of the phenotype manifest.

These changes should help improve the clarity and quality of our released data.

Quality control, heritability analyses, and updates to summary statistics

We are excited to report significant updates to our summary statistics and data release:

  1. We performed heritability analyses across > 16,000 ancestry-trait pairs using several approaches.
  2. We developed a detailed summary statistics QC approach to prioritize the highest-quality phenotypes best suited for downstream analyses.
  3. We identified a maximally independent set of phenotypes that passed our QC filters.
  4. We recomputed summary statistics for traits that showed extremely significant p-values with standard errors of 0, now with non-zero standard errors and logp\log p-values to avoid numerical underflow.
  5. We updated cross-ancestry meta-analyses to incorporate updated summary statistics and also computed new meta-analyses using only QC-pass ancestry-trait pairs.

LD score comparisons between gnomAD and UK Biobank

In this analysis, we aimed to better understand the similarities and differences between imputation-based and sequence-based LD scores. To this end, we compared LD scores from gnomAD (sequence-based) and UKB (imputation-based) across four ancestry groups: African (AFR), Admixed American (AMR), East Asian (EAS), and North-Western European (EUR; NWE in gnomAD). This also serves as a more general comparison between LD scores from two separate cohorts. LD scores were obtained as follows.

Hosting update

We are so glad to see such enthusiastic interest for the pan-ancestry summary statistics! So enthusiastic that, as many of you may have noticed, we received serial bans from Dropbox due to excess traffic. Today, we are pleased to report that we have found a new hosting solution on Amazon AWS, who have graciously provided hosting services for the full dataset in both its forms (per-phenotype flat files, and Hail formats). We have now updated the links in the phenotype manifest, and all pages on this site to reflect the new locations. The dataset is also now available on the Amazon Registry of Open Data.

First release

We are thrilled to announce the release of GWAS summary statistics from the Pan-UK Biobank resource, which consists of genome-wide association analyses of 7,221 phenotypes across 6 continental ancestry groups in the UK Biobank. Across all phenotype-ancestry pairs, we conducted 16,131 GWAS and meta-analyzed summary statistics for all available populations by trait. This release includes more than 20,000 individuals with primarily non-European ancestries, substantially increasing the diversity typically investigated in analyses of these data.

A summary of the breakdown included in this release is:

PopulationSample sizeTotal phenosCategoricalContinuousPhecodeICD-10BiomarkersPrescriptions
AFR6636249398133719772530223
AMR980110542331205613040
CSA88762771105141823471930319
EAS27091612618915571429105
EUR42053172003672132592980030444
MID15991372509835259130107