Data Mining in Linkage Disequilibrium Mapping Jing Hua Zhao Epidemiology June 2003.

Slides:



Advertisements
Similar presentations
Introduction to Haplotype Estimation Stat/Biostat 550.
Advertisements

Association Tests for Rare Variants Using Sequence Data
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
METHODS FOR HAPLOTYPE RECONSTRUCTION
A Brief Overview of Affected Relative Pair analysis Jing Hua Zhao Institute of Psychiatry
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Linkage disequilibrium (LD) extent in B1 population by chromosome Chromosomes I to XII Mariela Aponte Villadoma Elisa J. Mihovilovich Castro Merideth Bonierbale.
Efficient Algorithms for Genome-wide TagSNP Selection across Populations via the Linkage Disequilibrium Criterion Authors: Lan Liu, Yonghui Wu, Stefano.
Joint Linkage and Linkage Disequilibrium Mapping
MALD Mapping by Admixture Linkage Disequilibrium.
1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North.
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
Ronnie A. Sebro Haplotype reconstruction BMI /21/2004.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Phasing of 2-SNP Genotypes Based on Non-Random Mating Model Dumitru Brinza joint work with Alexander Zelikovsky Department of Computer Science Georgia.
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
CS177 Lecture 10 SNPs and Human Genetic Variation
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
INTRODUCTION TO ASSOCIATION MAPPING
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
The International Consortium. The International HapMap Project.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Complex Adaptive Systems and Human Health: Statistical Approaches in Pharmacogenomics Kim E. Zerba, Ph.D. Bristol-Myers Squibb FDA/Industry Statistics.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Multiple-Locus Genome-Wide Association Testing David Dean CSE280A.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Genetic Linkage.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Of Sea Urchins, Birds and Men
Constrained Hidden Markov Models for Population-based Haplotyping
Statistical Applications in Biology and Genetics
Genome Wide Association Studies using SNP
Genetic Linkage.
Recombination (Crossing Over)
Introduction Data Mining for Business Analytics.
Patterns of Linkage Disequilibrium in the Human Genome
Statistical Methods for Quantitative Trait Loci (QTL) Mapping II
中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) Hello,, everyone. Nowadays, reconstruct haplotype.
The ‘V’ in the Tajima D equation is:
Genetic Drift, followed by selection can cause linkage disequilibrium
Genetic Linkage.
QTL Fine Mapping by Measuring and Testing for Hardy-Weinberg and Linkage Disequilibrium at a Series of Linked Marker Loci in Extreme Samples of Populations 
Outline Cancer Progression Models
Ho Kim School of Public Health Seoul National University
IBD Estimation in Pedigrees
Presentation transcript:

Data Mining in Linkage Disequilibrium Mapping Jing Hua Zhao Epidemiology June 2003

Outline of the Talk The problem Why data mining? Haplotype construction Challenging issues

Current Paradigm Complex traits (Lander & Schork 1994) Association mapping (Risch & Merikangas 1996) The need of both family and population- based study (Hodge et al. 2003) # SNPs

Linkage Disequilibrium The raw data is genetic markers LD is the non-random association between alleles at different loci Contains information on genetics of population (selection, mutation, recombination, admixture)

An Model with LDs Log-linear model to allow for higher order interaction (Weir & Wilson 1986) Applicable to a variety of null hypotheses (Huttley & Wilson 2000) Number of terms is exponential

Why Data Mining? 1.8 million SNPs, 1,240 hits on “haplotype and data mining” in 0.15 seconds Data mining is the process of exploration and analysis, by automatic or semi- automatic means, of large quantities of data in order to discover meaningful patterns and results (Berry & Linoff, 1997, 2000)

A Statistical Perspective Traditionally EDA, for a particular question Sheer size of data is problematic Now DM could be defined as the process of secondary analysis of large datrabases aimed at finding unsuspected relationships which are of interest or value to the database owners (Hand 1998)

Haplotype Pattern Mining Figure 1 (a) Strongly disease-associated haplotype patterns Enumeration DFS, which has good running time property

Significance A simple Chi-squared statistic: by a 2x2 table containing disease-associated and control chromosomes, in accordance with D’, significance determined via simulation Simulation on prevalence, evolutionary history and sample size, robustness Applicable to family data (Zhang et al. 2001)

Emerging Rules LD patterns are highly strutured (Daly et al. 2001) 5-8 markers (Niu et al. 2002; Zaykin et al. 2002;Toivonen et al. 2000) htSNPs (Johnson et al. 2001)

Problem of Haplotype Uncertainty EM (Cepellini et al. 1955) MCMC (Guo & Thompson 1992; Lazzaroni & Lange 1997; Stephens et al. 2001, Niu et al. 2002) Heuristic algorithms

Haplotype Reconstruction Table of genotypes (Xie & Ott 1993) Table of sufficient statistics (Zhao et al. 2000) and linked list Binary trees (Zhao & Sham 2002) Mixed-radix number (Zhao & Sham 2003) QuickSort (Zhao & Qian submitted)

Examples HLA (the evolution of EM algorithms, information content of SNP and SSR) ALDH2 (missing data, effectiveness of heuristic method) APOC (the disadvantage of QuickSort, heuristics, the inclusion of covariates)

Challenging Issues Genotype/Phenotype relationship by Whitehall II data (10,308 civil servants, with APOE genotypings) Associated with cognitive declines Need longitudinal data Will tie up with BioBank project

Statistical Methodology GLM needs to be extended The same with LDA models such as GLMM Search and Sort paradigm (Knuth)