LD score comparisons between gnomAD and UK Biobank

In this analysis, we aimed to better understand the similarities and differences between imputation-based and sequence-based LD scores. To this end, we compared LD scores from gnomAD (sequence-based) and UKB (imputation-based) across four ancestry groups: African (AFR), Admixed American (AMR), East Asian (EAS), and North-Western European (EUR; NWE in gnomAD). This also serves as a more general comparison between LD scores from two separate cohorts. LD scores were obtained as follows.

UKB LD scores

LD scores in UKB were computed within each ancestry group. The genotype matrix XX was standardized, and variants were filtered to MAC>20MAC > 20 (the same variant-level filter as used for the Pan UKBB GWAS). For covariate correction, the residuals from the regression of genotypeโˆผcovariatesgenotype \sim covariates were obtained via Xadj=McXX_{adj} = M_cX where Mc=Iโˆ’C(CTC)โˆ’1CTM_c = I - C(C^TC)^{-1}C^T, the residual-maker matrix, and CC is the matrix of covariates. The covariates used for adjustment were the same covariates used for the Pan-UKB GWAS, namely ageage, sexsex, ageโˆ—sexage*sex, age2age^2, age2โˆ—sexage^2*sex, and the first 10 PCs of the genotype matrix (more information about covariate selection can be found at the Pan UKBB website). The LD matrix was produced via r^=XadjTXadjn\hat{r} = \frac{X_{adj}^TX_{adj}}{n} with a window size of 1MB1 MB. A bias adjustment for r^2\hat{r}^2 was performed by r~2=nโˆ’1nโˆ’2r^2โˆ’1nโˆ’2\tilde{r}^2 = \frac{n-1}{n-2}\hat{r}^2 - \frac{1}{n-2}. LD scores were subsequently computed using r~2\tilde{r}^2 with a radius of 1MB1 MB.

gnomAD LD scores

LD scores were previously computed from individuals in the gnomAD cohort within each ancestry. After filtering variants to MAF>0.005MAF > 0.005 and standardizing the genotype matrix XX, the LD matrix was constructed via r^=XTXn\hat{r} = \frac{X^TX}{n} with a radius of 1MB1 MB. A bias adjustment for r^2\hat{r}^2 was performed by r~2=nโˆ’1nโˆ’2r^2โˆ’1nโˆ’2\tilde{r}^2 = \frac{n-1}{n-2}\hat{r}^2 - \frac{1}{n-2}. LD scores were subsequently computed using r~2\tilde{r}^2 with variants with AF>0.01AF > 0.01 and sufficiently high call rate (>0.8> 0.8) with a radius of 1MB1 MB.

Comparisons

LD scores were compared from individuals in the gnomAD cohort from ancestries that were also analyzed in UKB, namely individuals with African (AFR), Admixed American (AMR), East Asian (EAS), and non-Finnish European (EUR; NFE in gnomAD) ancestry. We restricted our comparisons to HapMap3 SNPs.

Figure 1: Pairwise comparisons of LD scores in UKB vs. gnomAD within ancestries. Red line is the y=xy=x line. SNPs were subsampled randomly prior to plotting to improve readability.

For these four populations, we find very strong concordance of LD scores when comparing gnomAD and UKB (Figure 1). r2r^2 values were consistently above 0.90.9, with highest values observed for EUR and EAS. Correlation values for AMR and AFR were slightly lower. We suspect that differences may be due to (1) the higher degree of genetic diversity in these populations and (2) systematic differences in these populations between the gnomAD and UKB cohorts.