Multiple-Locus Genome-Wide Association Testing David Dean CSE280A.

Slides:



Advertisements
Similar presentations
BIOL EVOLUTION AT MORE THAN ONE GENE SO FAR Evolution at a single locus No interactions between genes One gene - one trait REAL evolution: 10,000.
Advertisements

Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Signatures of Selection
Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Picking SNPs Application to Association Studies Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
IUMSP Institut universitaire de médecine sociale et préventive, Lausanne Exploring the association of the CYP1A1- CYP1A2 locus with blood pressure in CoLaus.
Fine mapping QTLs using Recombinant-Inbred HS and In-Vitro HS William Valdar Jonathan Flint, Richard Mott Wellcome Trust Centre for Human Genetics.
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Genetic Linkage. Two pops may have the same allele frequencies but different chromosome frequencies.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
Bioinformatics R for Bioinformatics PART II Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
BGRS 2006 SEARCH FOR MULTI-SNP DISEASE ASSOCIATION D. Brinza, A. Perelygin, M. Brinton and A. Zelikovsky Georgia State University, Atlanta, GA, USA 123.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
INTRODUCTION TO ASSOCIATION MAPPING
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Copyright © 2004 Pearson Prentice Hall, Inc. Chapter 7 Multiple Loci & Sex=recombination.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Recombination Mapping SNP mapping
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
California Pacific Medical Center
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Association tests. Basics of association testing Consider the evolutionary history of individuals proximal to the disease carrying mutation.
Common variation, GWAS & PLINK
Genetic Linkage.
Genetic Linkage.
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Patterns of Linkage Disequilibrium in the Human Genome
The ‘V’ in the Tajima D equation is:
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Correlation for a pair of relatives
Genetic Drift, followed by selection can cause linkage disequilibrium
Genetic Linkage.
Emily C. Walsh, Kristie A. Mather, Stephen F
Haplotypes at ATM Identify Coding-Sequence Variation and Indicate a Region of Extensive Linkage Disequilibrium  Penelope E. Bonnen, Michael D. Story,
Presentation transcript:

Multiple-Locus Genome-Wide Association Testing David Dean CSE280A

Genome-wide Association Testing Genome-wide association tests have used the concept of linkage disequilibrium (LD) to identify individual genes that correlate with disease phenotypes. Genome-wide association tests have used the concept of linkage disequilibrium (LD) to identify individual genes that correlate with disease phenotypes. However, many human diseases arise out of the interaction of multiple genes, rather than just a single gene. However, many human diseases arise out of the interaction of multiple genes, rather than just a single gene.

Linkage Dis-equilibrium SNPs that are close to each other on a chromosome tend to have a high correlation, relative to ones that are far apart from each other. Recombination works to undo this correlation. SNPs that are close to each other on a chromosome tend to have a high correlation, relative to ones that are far apart from each other. Recombination works to undo this correlation. Without recombination Without recombination P 11 is not equal to P 1* P *1 P 11 is not equal to P 1* P *1 D = |P 11 – P *1 P 1* | D = |P 11 – P *1 P 1* | With recombination, LD will decay with distance between the two loci With recombination, LD will decay with distance between the two loci Linkage Equilibrium: P 11 = P 1* P *1 (loci are independent) Linkage Equilibrium: P 11 = P 1* P *1 (loci are independent)

Disease Gene Mapping The disease phenotypes of the individuals being studied can be treated as a column vector, similar to a column vector of SNPs. LD is used to find a locus that is close to the locus of interest. The disease phenotypes of the individuals being studied can be treated as a column vector, similar to a column vector of SNPs. LD is used to find a locus that is close to the locus of interest. If you find a locus (and a particular allele at that locus) that correlates highly with a particular disease phenotype, then one can infer that the allele “may play an important role” in the development of that disease. If you find a locus (and a particular allele at that locus) that correlates highly with a particular disease phenotype, then one can infer that the allele “may play an important role” in the development of that disease.

Epistasis The interaction between genes, or epistasis, is an important area of genetics research, where much is still unknown. The interaction between genes, or epistasis, is an important area of genetics research, where much is still unknown. For example, one gene may suppress the expression of another gene. For example, one gene may suppress the expression of another gene. Gene-gene interactions can be synergistic (positive) or antagonistic (negative). Gene-gene interactions can be synergistic (positive) or antagonistic (negative).

The Problem Testing multiple loci across the whole genome that interact and contribute to a particular phenotype can present a computational challenge. Testing multiple loci across the whole genome that interact and contribute to a particular phenotype can present a computational challenge. Example: 10 4 individuals * 10 6 SNPs Example: 10 4 individuals * 10 6 SNPs # of SNP pairs = 10 6 * 10 6 = # of SNP pairs = 10 6 * 10 6 = # of SNP trios = 10 6 * 10 6 * 10 6 = # of SNP trios = 10 6 * 10 6 * 10 6 = 10 18

Objective The objective is discover an efficient method to perform genome-wide association testing, which identifies multiple loci that may be interacting and contributing to a disease phenotype. The objective is discover an efficient method to perform genome-wide association testing, which identifies multiple loci that may be interacting and contributing to a disease phenotype.

Evans et al strategies tested: 4 strategies tested: Single-locus tests of association Single-locus tests of association Exhaustive two-locus search Exhaustive two-locus search Fit all possible two-locus models of association to all pairs of SNPs Fit all possible two-locus models of association to all pairs of SNPs “Both Significant” two-stage strategy “Both Significant” two-stage strategy Applies single-locus test to determine which loci to include in the second stage of pairwise association testing Applies single-locus test to determine which loci to include in the second stage of pairwise association testing “Either Significant” two-stage strategy “Either Significant” two-stage strategy Applies single-locus test to determine a set of loci to then test in second stage, but only requires 1 of pair to pass initial phase Applies single-locus test to determine a set of loci to then test in second stage, but only requires 1 of pair to pass initial phase These two-stage strategies were less powerful than the exhaustive two-locus search strategies, but were able to significantly reduce the computational burden These two-stage strategies were less powerful than the exhaustive two-locus search strategies, but were able to significantly reduce the computational burden

Current Project Start with n x m SNP matrix (Rana et al 2007) Start with n x m SNP matrix (Rana et al 2007) n = # of haplotypes (~10 4 ) n = # of haplotypes (~10 4 ) m = # of SNPs (~10 6 ) m = # of SNPs (~10 6 ) For a pair of SNPs, s 1 and s 2 For a pair of SNPs, s 1 and s 2 Labeled-hamming-distance: Labeled-hamming-distance: H[s 1, s 2 ] = min{p 1 p 2 + q 1 q 2, p 1 q 2 + p 2 q 1 } if H is low, then s 1 and s 2 are correlated if H is high, then s 1 and s 2 are uncorrelated Formalize and quantify an efficient filtering method Formalize and quantify an efficient filtering method Identify a hamming distance, d 1, to act as a threshold that filters out pairs that may be correlated Identify a hamming distance, d 1, to act as a threshold that filters out pairs that may be correlated This small subset can then be exhaustively tested for epistatic interactions This small subset can then be exhaustively tested for epistatic interactions

Current Project PairedSNPs( δ, k ) PairedSNPs( δ, k ) Repeat for l iterations: Repeat for l iterations: Select k rows of haplotypes at random Select k rows of haplotypes at random For each SNP location, j, hash into the SNP vector h j and the bitwise complement ĥ j For each SNP location, j, hash into the SNP vector h j and the bitwise complement ĥ j Filter pairs of SNPs that have a hamming distance < d 1 n Filter pairs of SNPs that have a hamming distance < d 1 n Identify all pairs of SNPs that are filtered out at least (1 - δ)µ 1 times Identify all pairs of SNPs that are filtered out at least (1 - δ)µ 1 times µ 1 is the expected number of times that a SNP pair is filtered out, if the hamming distance is low (= d 1 ) µ 1 is the expected number of times that a SNP pair is filtered out, if the hamming distance is low (= d 1 ) µ 1 = le -k d 1 µ 1 = le -k d 1

Haploview An open source application designed to analyze and visualize patterns of LD, and perform association testing on genetic data. An open source application designed to analyze and visualize patterns of LD, and perform association testing on genetic data. Haploview is developed and maintained by Dr. Mark Daly’s lab at MIT (Barrett et al 2005). Haploview is developed and maintained by Dr. Mark Daly’s lab at MIT (Barrett et al 2005).

References Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21: , Barrett, J.C., Fry, B., Maller, J., and Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21: , Brizna, D., He, J., and Zelikovsky, A. Combinatorial search methods for multi-SNP disease association. Proc. of IEEE EMBS Annual International Conference, Brizna, D., He, J., and Zelikovsky, A. Combinatorial search methods for multi-SNP disease association. Proc. of IEEE EMBS Annual International Conference, Evans, D.M., Marchini, J., Morris, A.P., and Cardon, L.R. Two-stage two-locus models in genome-wide association. PLoS Genetics, 2:e157, Sep Evans, D.M., Marchini, J., Morris, A.P., and Cardon, L.R. Two-stage two-locus models in genome-wide association. PLoS Genetics, 2:e157, Sep Rana, B.K., Insel, P.A., Payne, S.H., Abel, K., Beutler, E., Ziegler, M.G., Schork, N.J., and O’Connor, D.T. Population-based sample reveals gene-gender interactions in blood pressure in white americans. Hypertension, 49:96-106, Jan Rana, B.K., Insel, P.A., Payne, S.H., Abel, K., Beutler, E., Ziegler, M.G., Schork, N.J., and O’Connor, D.T. Population-based sample reveals gene-gender interactions in blood pressure in white americans. Hypertension, 49:96-106, Jan 2007.