Multiple-Locus Genome-Wide Association Testing David Dean CSE280A
Genome-wide Association Testing Genome-wide association tests have used the concept of linkage disequilibrium (LD) to identify individual genes that correlate with disease phenotypes. Genome-wide association tests have used the concept of linkage disequilibrium (LD) to identify individual genes that correlate with disease phenotypes. However, many human diseases arise out of the interaction of multiple genes, rather than just a single gene. However, many human diseases arise out of the interaction of multiple genes, rather than just a single gene.
Linkage Dis-equilibrium SNPs that are close to each other on a chromosome tend to have a high correlation, relative to ones that are far apart from each other. Recombination works to undo this correlation. SNPs that are close to each other on a chromosome tend to have a high correlation, relative to ones that are far apart from each other. Recombination works to undo this correlation. Without recombination Without recombination P 11 is not equal to P 1* P *1 P 11 is not equal to P 1* P *1 D = |P 11 – P *1 P 1* | D = |P 11 – P *1 P 1* | With recombination, LD will decay with distance between the two loci With recombination, LD will decay with distance between the two loci Linkage Equilibrium: P 11 = P 1* P *1 (loci are independent) Linkage Equilibrium: P 11 = P 1* P *1 (loci are independent)
Disease Gene Mapping The disease phenotypes of the individuals being studied can be treated as a column vector, similar to a column vector of SNPs. LD is used to find a locus that is close to the locus of interest. The disease phenotypes of the individuals being studied can be treated as a column vector, similar to a column vector of SNPs. LD is used to find a locus that is close to the locus of interest. If you find a locus (and a particular allele at that locus) that correlates highly with a particular disease phenotype, then one can infer that the allele “may play an important role” in the development of that disease. If you find a locus (and a particular allele at that locus) that correlates highly with a particular disease phenotype, then one can infer that the allele “may play an important role” in the development of that disease.
Epistasis The interaction between genes, or epistasis, is an important area of genetics research, where much is still unknown. The interaction between genes, or epistasis, is an important area of genetics research, where much is still unknown. For example, one gene may suppress the expression of another gene. For example, one gene may suppress the expression of another gene. Gene-gene interactions can be synergistic (positive) or antagonistic (negative). Gene-gene interactions can be synergistic (positive) or antagonistic (negative).
The Problem Testing multiple loci across the whole genome that interact and contribute to a particular phenotype can present a computational challenge. Testing multiple loci across the whole genome that interact and contribute to a particular phenotype can present a computational challenge. Example: 10 4 individuals * 10 6 SNPs Example: 10 4 individuals * 10 6 SNPs # of SNP pairs = 10 6 * 10 6 = # of SNP pairs = 10 6 * 10 6 = # of SNP trios = 10 6 * 10 6 * 10 6 = # of SNP trios = 10 6 * 10 6 * 10 6 = 10 18
Objective The objective is discover an efficient method to perform genome-wide association testing, which identifies multiple loci that may be interacting and contributing to a disease phenotype. The objective is discover an efficient method to perform genome-wide association testing, which identifies multiple loci that may be interacting and contributing to a disease phenotype.
Evans et al strategies tested: 4 strategies tested: Single-locus tests of association Single-locus tests of association Exhaustive two-locus search Exhaustive two-locus search Fit all possible two-locus models of association to all pairs of SNPs Fit all possible two-locus models of association to all pairs of SNPs “Both Significant” two-stage strategy “Both Significant” two-stage strategy Applies single-locus test to determine which loci to include in the second stage of pairwise association testing Applies single-locus test to determine which loci to include in the second stage of pairwise association testing “Either Significant” two-stage strategy “Either Significant” two-stage strategy Applies single-locus test to determine a set of loci to then test in second stage, but only requires 1 of pair to pass initial phase Applies single-locus test to determine a set of loci to then test in second stage, but only requires 1 of pair to pass initial phase These two-stage strategies were less powerful than the exhaustive two-locus search strategies, but were able to significantly reduce the computational burden These two-stage strategies were less powerful than the exhaustive two-locus search strategies, but were able to significantly reduce the computational burden
Current Project Start with n x m SNP matrix (Rana et al 2007) Start with n x m SNP matrix (Rana et al 2007) n = # of haplotypes (~10 4 ) n = # of haplotypes (~10 4 ) m = # of SNPs (~10 6 ) m = # of SNPs (~10 6 ) For a pair of SNPs, s 1 and s 2 For a pair of SNPs, s 1 and s 2 Labeled-hamming-distance: Labeled-hamming-distance: H[s 1, s 2 ] = min{p 1 p 2 + q 1 q 2, p 1 q 2 + p 2 q 1 } if H is low, then s 1 and s 2 are correlated if H is high, then s 1 and s 2 are uncorrelated Formalize and quantify an efficient filtering method Formalize and quantify an efficient filtering method Identify a hamming distance, d 1, to act as a threshold that filters out pairs that may be correlated Identify a hamming distance, d 1, to act as a threshold that filters out pairs that may be correlated This small subset can then be exhaustively tested for epistatic interactions This small subset can then be exhaustively tested for epistatic interactions
Current Project PairedSNPs( δ, k ) PairedSNPs( δ, k ) Repeat for l iterations: Repeat for l iterations: Select k rows of haplotypes at random Select k rows of haplotypes at random For each SNP location, j, hash into the SNP vector h j and the bitwise complement ĥ j For each SNP location, j, hash into the SNP vector h j and the bitwise complement ĥ j Filter pairs of SNPs that have a hamming distance < d 1 n Filter pairs of SNPs that have a hamming distance < d 1 n Identify all pairs of SNPs that are filtered out at least (1 - δ)µ 1 times Identify all pairs of SNPs that are filtered out at least (1 - δ)µ 1 times µ 1 is the expected number of times that a SNP pair is filtered out, if the hamming distance is low (= d 1 ) µ 1 is the expected number of times that a SNP pair is filtered out, if the hamming distance is low (= d 1 ) µ 1 = le -k d 1 µ 1 = le -k d 1
Haploview An open source application designed to analyze and visualize patterns of LD, and perform association testing on genetic data. An open source application designed to analyze and visualize patterns of LD, and perform association testing on genetic data. Haploview is developed and maintained by Dr. Mark Daly’s lab at MIT (Barrett et al 2005). Haploview is developed and maintained by Dr. Mark Daly’s lab at MIT (Barrett et al 2005).
References Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21: , Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21: , Brizna, D., He, J., and Zelikovsky, A. Combinatorial search methods for multi-SNP disease association. Proc. of IEEE EMBS Annual International Conference, Brizna, D., He, J., and Zelikovsky, A. Combinatorial search methods for multi-SNP disease association. Proc. of IEEE EMBS Annual International Conference, Evans, D.M., Marchini, J., Morris, A.P., and Cardon, L.R. Two-stage two-locus models in genome-wide association. PLoS Genetics, 2:e157, Sep Evans, D.M., Marchini, J., Morris, A.P., and Cardon, L.R. Two-stage two-locus models in genome-wide association. PLoS Genetics, 2:e157, Sep Rana, B.K., Insel, P.A., Payne, S.H., Abel, K., Beutler, E., Ziegler, M.G., Schork, N.J., and O’Connor, D.T. Population-based sample reveals gene-gender interactions in blood pressure in white americans. Hypertension, 49:96-106, Jan Rana, B.K., Insel, P.A., Payne, S.H., Abel, K., Beutler, E., Ziegler, M.G., Schork, N.J., and O’Connor, D.T. Population-based sample reveals gene-gender interactions in blood pressure in white americans. Hypertension, 49:96-106, Jan 2007.