Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.

Slides:



Advertisements
Similar presentations
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Advertisements

Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Sampling distributions of alleles under models of neutral evolution.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Polymorphism Structure of the Human Genome Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
CSE182-L17 Clustering Population Genetics: Basics.
A dynamic program algorithm for haplotype block partitioning Zhang, et. al. (2002) PNAS. 99, 7335.
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
Informative SNP Selection Based on Multiple Linear Regression
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Lecture 7.01 The informatics of SNPs and haplotypes Gabor T. Marth Department of Biology, Boston College CGDN Bioinformatics Workshop June.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Introduction to SNP and Haplotype Analysis
Of Sea Urchins, Birds and Men
L4: Counting Recombination events
Introduction to SNP and Haplotype Analysis
Estimating Recombination Rates
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Outline Cancer Progression Models
Haplotypes at ATM Identify Coding-Sequence Variation and Indicate a Region of Extensive Linkage Disequilibrium  Penelope E. Bonnen, Michael D. Story,
Approximation Algorithms for the Selection of Robust Tag SNPs
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001)
Approximation Algorithms for the Selection of Robust Tag SNPs
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Block Partition with Limited Resources and Applications to Human Chromosome 21 Haplotype Data  Kui Zhang, Fengzhu Sun, Michael S. Waterman,
Presentation transcript:

Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University

Key Papers 1.N. Patil et al., (2001), Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21, Science, vol. 294, pp N. Wang et al., (2002), Distribution of Recombination Crossovers and the Origin of Haplotype Blocks: The Interplay of Population History, Recombination and Mutation, Am. J. Hum. Genet., vol. 71, pp K. Zhang et al., (2002), A Dynamic Programming Algorithm for Haplotype Block Partitioning, PNAS, vol. 99, pp

Supplementary Papers 1.R. Hudson, N. Kaplan, (1985), Statistical Properties of the Number of Recombination Events in The History of a Sample of DNA sequences, Genetics, vol. 111, pp R. Hudson, 2002, Generating Samples under a Wright- Fisher Neutral Model of Genetic Variation, Bioinformatics, vol. 18, pp D. Reich et al., (2001), Linkage Disequilibrium in the Human Genome, Nature, vol. 411, pp

What are Haplotype Blocks ? Haplotype block = a sequence of contiguous markers on DNA, homogeneous according to some criterion Markers = Single Nucleotide Polymorphisms (SNPs)

Data (Patil et al. 2001) Chromosome 21 Physically separated the two copies of chromosome 21 using a rodent-human somatic cell hybrid technique Sample of 20 copies of chromosome 21 ( bases) Found: SNPs

Fig. 2 from (Patil et al. 2001)

…… i = 1, 2, …, SNP no i

Problems

How do we determine boundaries between blocks ? 1.Average value of standarized coefficient of linkage disequilibrium is greater than some threshold (Wang et al. 2002, Reich et al. 2001) 2.Infer sites in the sample of DNA sequences where recombination events happened in the past history (Wang et al. 2002, Hudson, 2002) 3.Chromosome coverage – minimum number of SNPs to account for majority of haplotypes (Patil et al. 2001, Zhang et al. 2002)

What evolutionary forces are responsible for haplotype blocks formation ? Mutation Genetic drift Recombination Recombination hot spots

Methods

Method 1 (Wang et al. 2002) Infer sites in the sample of DNA sequences where recombination events happened in the past history

Three gamete condition Consider a pair of SNPs, SNP1 and SNP2. If there was no recombination between SNP1 and SNP2, they must satisfy three gamete condition SNP1 SNP2 SNP1SNP2 A G C C GT AGAGCTCT AC GC GT

Four gamete test (Hudson and Kaplan, 1985) If we see all four gametes at SNP1 and SNP2 SNP1SNP2 A G C C GT AT Then there must have been a recombination event between these sites in their past history 4GT

Array of pairwise 4GT test results Hudson and Kaplan, 1985 D, d ij = 0, if there are less then 4 gametes 1, if there are 4 gametes What is the minimal number of recombinations that could explain observed data ? Statistics F R (Hudson and Kaplan, 1985)

Fig. 1 from Wang et al., 2002 D Block 1Block 2Block 3

Wang et al., Study R. Hudson’s program for simulating genealogies with mutation, drift and recombination under various demographic scenarios Study of dependence of average lengths of blocks on different factors Comparison of simulation results to data from Patil et al., 2002

Dependence of average lengths of blocks on recombination frequency

… on sample size

... on mutation intensity

Comparison to data from Patil et al Compute distribution of haplotype block lengths in the data from Patil et al Try to tune parameters  and R to obtain similar distribution in the simulations

… Failed

Try a mixture of two different recombination frequencies - better

Method 2 (Patil, 2001) Chromosome coverage – minimum number of SNPs to account for majority of haplotypes

Fig. 2 from (Patil et al. 2001)

Problem formulation Define block boundaries to minimize the number of SNPs that distinguish at least  percent of the haplotypes in each block

Common haplotypes Those represented more than one in the block

Condition Common haplotypes must constitute at least  =80 percent of all haplotypes in the block Blocks that do not satisfy this are not allowed

Fragment of Fig. 2 from Patil et al., 2001

Notation B – block defined as numbers of SNPs, e.g., B = 45, 46,….50, or B = i, i+1,…, j L(B) length of the block (number of SNPs) f(B) – minimum number of SNP’s required to distinguish common haplotypes

Greedy Solution Start End 1. Increment end0. Fix Start =End 2. Compute ratio L(B)/f(B) ……. 3. Stop at max 4. Go to 0

Results 4563 representative SNPs (13%) 4135 blocks

Method 3 (Zhang et al. 2002) Solves the same problem of 80% chromosome coverage, but using the better method of dynamic programming

Dynamic programming solution …… Optimal partition of SNPs 1,2, … i Assume that for all i=1, 2, …, j-1 we know optimal block partition, B 1 (i), B 2 (i), …, B k (i) that minimizes: i B 1 (i)B 2 (i)B 3 (i)

Bellman’s equation

Results 3582 representative SNPs (compared to 4563 from greedy algorithm) 2575 blocks (compared to 4135 blocks from greedy algorithm)

Conclusions Studying haplotype block partitions is very important to 1. Constructing haplotype maps for genetic traits 2. Understanding recombination in human genome

To expect A lot of papers in this area appearing in scientific journals