LD scores and matrices

Overview#

We computed in-sample dosage-based LD matrices and scores for each of six ancestry group in UKBB. LD matrices are available in Hail's BlockMatrix format on Amazon AWS (see details here). LD scores are available in LDSC-compatible flat files (.l2.ldscore.gz and .M_5_50) here. For large-scale analysis, you can also find a full LD score Hail Table (not restricted to the HapMap3 variants) on Amazon AWS (see details here)

For LD computation, please find technical details below. All the code is also publicly available here. Detailed instruction for how to run LD score regression is available on LDSC's website.

LD matrices#

  • The dosage-based genotype matrix XX was column-wise mean-centered and normalized.
  • We applied the same variant QC filter used for the Pan-UKB GWAS (INFO > 0.8, MAC > 20 in each population; see details here)
  • For covariate correction, the residuals from the regression of genotypeโˆผcovariatesgenotype \sim covariates were obtained via Xadj=McXX_{adj} = M_cX where Mc=Iโˆ’C(CTC)โˆ’1CTM_c = I - C(C^TC)^{-1}C^T, the residual-maker matrix, and CC is the matrix of covariates.
  • We used the same covariates used for the Pan-UKB GWAS, namely ageage, sexsex, ageโˆ—sexage*sex, age2age^2, age2โˆ—sexage^2*sex, and the first 10 PCs of the genotype matrix (see details here).
  • We then computed LD matrix RR via R=XadjTXadjnR = \frac{X_{adj}^TX_{adj}}{n} with a radius of 10 Mb. Each element r^jk\hat{r}_{jk} of RR represents the Pearson correlation coefficient of genotypes between variant jj and kk.
  • For X-chromosome, we computed a LD matrix jointly using both males and females where male genotypes are coded 0/1 and female genotypes are coded 0/1/2.

LD scores#

  • To account for an upward bias of the standard estimator of the Pearson correlation coefficient, we applied a bias adjustment for r^jk2\hat{r}^2_{jk} using r~jk2=nโˆ’1nโˆ’2r^jk2โˆ’1nโˆ’2\tilde{r}^2_{jk} = \frac{n-1}{n-2}\hat{r}^2_{jk} - \frac{1}{n-2}.
  • LD scores for variant jj were subsequently computed via lj=โˆ‘kr~jk2l_j = \sum_k \tilde{r}^2_{jk} with a radius of 1 MB.
  • For LDSC-compatible flat files, we only exported LD scores of high-quality HapMap 3 variants that are 1) in autosomes, 2) not in the MHC region, 3) biallelic SNPs, 4) with INFO > 0.9, and 5) MAF > 1% in UKB and gnomAD genome/exome (if available).
  • We note that, since we applied covariate adjustment above, these LD scores are equivalent to the covariate-adjusted LD scores as described in Luo, Y. & Li, X. et al., 2020