Presentation is loading. Please wait.

Presentation is loading. Please wait.

Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C

Similar presentations


Presentation on theme: "Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C"— Presentation transcript:

1 Biases and Reconciliation in Estimates of Linkage Disequilibrium in the Human Genome 
Itsik Pe’er, Yves R. Chretien, Paul I.W. de Bakker, Jeffrey C. Barrett, Mark J. Daly, David M. Altshuler  The American Journal of Human Genetics  Volume 78, Issue 4, Pages (April 2006) DOI: /502803 Copyright © 2006 The American Society of Human Genetics Terms and Conditions

2 Figure 1 Different allele-frequency spectra of public data sets. The fraction (Y-axis) of SNPs in each MAF bin (X-axis) is presented for each data set. Hereafter, we group available data by continent of predominant population origin. CE = CEPH European, WA = West African, and EA = East Asian. Whereas this grouping system pools together different populations, it has been observed (Rosenberg et al. 2002) that this approximation explains the lion’s share of the genetic differences between populations and, for our analysis, is actually overconservative (potentially attempting to reconcile populations with different LD). A, Samples from individuals of northern European origin living in Utah, collected by the CEPH. B, Samples from the Yoruba people collected at Ibadan or Nigeria, from the Beni people from Nigeria, or from African Americans of predominantly WA origin (McKeigue et al. 2000). C, Han Chinese individuals living in Beijing, Japanese living in Tokyo, or individuals of Chinese ancestry living in Los Angeles (in all but the SeattleSNPs data set). The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

3 Figure 2 Differences in LD across all data sets, as measured by four measures. A, Mean absolute D′ between marker pairs as a function of distance between the two markers. B, Mean r2 between marker pairs as a function of distance. C, Fraction of marker pairs having a proxy with r2 greater than or equal to the threshold, as a function of that threshold. All SNPs are included without any filtering on the basis of frequencies. Error bars represent empirical 95% CIs estimated by the bootstrap resampling of 90% of the SNPs. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

4 Figure 3 Pairwise LD and correlation reconciled by matching allele frequencies and sample size. Mean absolute D′ (A) and r2 (B) across data sets is shown as a function of distance for CE, WA, and EA populations, normalizing allele frequency to a uniform distribution and sample size to 46 chromosomes of unrelated individuals. This normalization reconciles LD and largely reconciles pairwise correlation, with the possible exception that ENCODE and HapMap are noticeably different, especially considering the fact that these data sets examined the same individuals. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

5 Figure 4 Proxy rate reconciled by controlling for SNP density and region length. A, Fraction of SNPs with another SNP correlated at r2⩾0.8, as a function of SNP density for CE, WA, and EA populations. Proxy rate is shown across data sets with allele frequency normalized to be uniformly distributed and sample size set to 46 chromosomes of unrelated individuals. Proxy count is largely reconciled by controlling for these factors, with the exception of SeattleSNPs. B, Proxy rate compared among SeattleSNPs, with uncontrolled ENCODE for reference (solid red line) and ENCODE controlled for region length and sample size to match SeattleSNPs (dashed red line). These two data sets are similar in allele-frequency spectra and in SNP density but require normalization of region length for reconciliation, demonstrating the importance of this confounder (fig. B2). The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

6 Figure 5 r2 in ENCODE, as a function of resequencing depth. Effect of resequencing depth on ascertainment bias, as observed by the decay of average pairwise correlation (r2, Y-axis) with distance (X-axis) in ENCODE CE data. Ascertainment of SNPs by the resequencing of a certain number of individuals is mimicked by discarding SNPs that are monomorphic in these individuals and controlling for allele-frequency spectrum differences. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

7 Figure 6 HapMap phase II predicted to agree with other data sets in r2. Although the HapMap phase I data set does not agree with ENCODE in r2 when the latter is adjusted to the uniform MAF distribution, the recent completion of chromosome 2 in phase II shows that the phase II HapMap, if consistent with the chromosome 2 data, will agree completely with ENCODE in this respect. The chromosome 2 data from phase I is presented for comparison. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

8 Figure A1 Robustness of the thinning procedure. To evaluate the effects of resampling, we examined 100 replicates of the most-severe thinnings performed, thinning SeattleSNPs to a flat-allele frequency spectrum (A) and thinning ENCODE by sample size and SNP density (B). We show that pairwise measures of LD, D′ (left panel) and r2 (middle panel), require averaging over 10 replicates to provide reproducible averages, whereas single proxy–rate replicates (right panel) provide accurate results. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

9 Figure A2 Analysis of D′ and r2 in genes with Perlegen and HapMap. The decay of average pairwise LD (Y-axis) is shown with distance (X-axis), measured by D′ (A) and r2 (B) with HapMap and Perlegen data in the three populations, CE, WA, and EA. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

10 Figure A3 Effect of sample size on pairwise LD with ENCODE. The average pairwise LD (Y-axis) is shown as a function of distance (X-axis), measured by D′ (A) and r2 (B) with ENCODE data in the three populations, CE, WA, and EA. Each curve represents a different number of unrelated individuals resampled from the full ENCODE data. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

11 Figure B1 Effect of MAF on pairwise LD with ENCODE. The average pairwise LD (Y-axis) is shown as a function of distance (X-axis), measured by D′ (A) and r2 (B) with ENCODE data in the three populations, CE, WA, and EA. Each curve averages a quartile of SNPs ranked by MAF. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

12 Figure B2 Effects of density and region length on proxy count with ENCODE. Proxy count (Y-axis) is shown as a function of region length (A, X-axis) and density (B, X-axis). The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

13 Figure B3 LD in long versus short SeattleSNPs regions. SeattleSNPs regions were sorted by region length and were partitioned into subsets containing the longer and shorter regions, each containing half the SNPs. Mean r2 versus genomic distance (A) and proxy rate (B) are shown for longer and shorter region sets. The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions

14 Figure B4 Effect of ascertainment on r2 with ENCODE and HapMap. A, Effect of dbSNP double-hit status on the decay of average pairwise correlation (r2, Y-axis) with distance (X-axis) in ENCODE and HapMap data in the three populations, CE, WA, and EA. Data are stratified by the consideration of only single-hit or double-hit dbSNP SNPs at a time. All data sets are equalized to have the same (uniform) MAF spectrum. B, Pairwise correlation computed in ENCODE and HapMap (all individuals) only for double-hit SNPs with MAF ⩾0.25. These ascertainment and frequency restrictions reconcile these data sets, suggesting that the discrepancy described above results from the differing ascertainment strategies between these data sets (see appendix B). The American Journal of Human Genetics  , DOI: ( /502803) Copyright © 2006 The American Society of Human Genetics Terms and Conditions


Download ppt "Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C"

Similar presentations


Ads by Google