Equilibria in populations

Slides:



Advertisements
Similar presentations
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Advertisements

Sampling distributions of alleles under models of neutral evolution.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
MALD Mapping by Admixture Linkage Disequilibrium.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
CSE182-L17 Clustering Population Genetics: Basics.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
PROCESS OF EVOLUTION I (Genetic Context). Since the Time of Darwin  Darwin did not explain how variation originates or passed on  The genetic principles.
Brachydactyly and evolutionary change
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Population Genetics Learning Objectives
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
CSE280Vineet Bafna CSE280a: Algorithmic topics in bioinformatics Vineet Bafna.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
The Hardy-Weinberg principle is like a Punnett square for populations, instead of individuals. A Punnett square can predict the probability of offspring's.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
1.Stream A and Stream B are located on two isolated islands with similar characteristics. How do these two stream beds differ? 2.Suppose a fish that varies.
CSE280Vineet Bafna In a ‘stable’ population, the distribution of alleles obeys certain laws – Not really, and the deviations are interesting HW Equilibrium.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Association tests. Basics of association testing Consider the evolutionary history of individuals proximal to the disease carrying mutation.
The Hardy-Weinberg theorem describes the gene pool of a nonevolving population. This theorem states that the frequencies of alleles and genotypes in a.
1,3, ,
Lecture 6 Genetic drift & Mutation Sonja Kujala
Genetic Linkage.
Population Genetics direct extension of Mendel’s laws, molecular genetics, and the ideas of Darwin Instead of genetic transmission between individuals,
Constrained Hidden Markov Models for Population-based Haplotyping
Population Genetics.
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
New Courses in the Fall Biodiversity -- Pennings
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
POPULATION GENETICS The study of the allele frequencies in a population The Hardy Weinberg equilibrium “Allele and genotype frequencies in a population.
CSE 280A: Advanced Topics in Computational Molecular Biology
L4: Counting Recombination events
Genetic Linkage.
Recombination (Crossing Over)
Chapter 23 The Evolution of Populations
Washington State University
Patterns of Linkage Disequilibrium in the Human Genome
Estimating Recombination Rates
The ‘V’ in the Tajima D equation is:
Basic concepts on population genetics
Phasing of 2-SNP Genotypes Based on Non-Random Mating Model
The Evolution of Populations
Population genetics and Hardy-Weinberg
Vineet Bafna/Pavel Pevzner
5 Agents of evolutionary change
The coalescent with recombination (Chapter 5, Part 1)
Genetic Linkage.
Microevolution: Population Genetics and Hardy-Weinberg Equilibrium
Modern Evolutionary Biology I. Population Genetics
Chapter 23 – The Evolution of Populations
Washington State University
Hardy-Weinberg Principle
Outline Cancer Progression Models
Haplotypes at ATM Identify Coding-Sequence Variation and Indicate a Region of Extensive Linkage Disequilibrium  Penelope E. Bonnen, Michael D. Story,
Modern Evolutionary Biology I. Population Genetics
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
The Evolution of Populations
Presentation transcript:

Equilibria in populations CSE280 Vineet Bafna

Basic Principles In a ‘stable’ population, the distribution of alleles obeys certain laws Not really, and the deviations are interesting HW Equilibrium (due to mixing in a population) Linkage (dis)-equilibrium Due to recombination CSE280 Vineet Bafna

Hardy-Weinberg principle Time (generations) Given: Population of diploid individuals and a locus with alleles, A & a 3 Genotypes: AA, Aa, aa A Q: Will the frequency of alleles and genotypes remain constant from generation to generation? a

To the Editor of Science: I am reluctant to intrude in a discussion concerning matters of which I have no expert knowledge, and I should have expected the very simple point which I wish to make to have been familiar to biologists. However, some remarks of Mr. Udny Yule, to which Mr. R. C. Punnett has called my attention, suggest that it may still be worth making... ………. A little mathematics of the multiplication-table type is enough to show ….the condition for this is q2 = pr. And since q12 = p1r1, whatever the values of p, q, and r may be, the distribution will in any case continue unchanged after the second generation

Hardy Weinberg equilibrium Suppose, Pr(A)=p, and Pr(a)=1-p=q If certain assumptions are met Large, diploid, population Discrete generations Random mating No selection,… Then, in every generation Aa AA aa

Hardy-Weinberg principle Time (generations) In the next generation A a

Hardy Weinberg: Generalization Multiple alleles with frequencies By HW, Multiple loci?

Hardy Weinberg: Implications The allele frequency does not change from generation to generation. True or false? It is observed that 1 in 10,000 caucasians have the disease phenylketonuria. The disease mutation(s) are all recessive. What fraction of the population carries the mutation? Males are 100 times more likely to have the ‘red’ type of color blindness than females. Why? Individuals homozygous for S have the sickle-cell disease. In an experiment, the ratios A/A:A/S: S/S were 9365:2993:29. Is HWE violated? Is there a reason for this violation? A group of individuals was chosen in NYC. Would you be surprised if HWE was violated? Conclusion: While the HW assumptions are rarely satisfied, the principle is still important as a baseline assumption, and significant deviations are interesting. CSE280 Vineet Bafna

A modern example SNP-chips can give us the genotype at each site based on hybridization. Plot the 3 genotypes at each locus on 3 separate horizontal lines. Genomic location 1/1 0/1 0/0 Genomic location 1/1 0/1 0/0 Zoomed Out Picture

A modern example of HW application SNP-chips can give us the allelic value at each polymorphic site based on hybridization. What is peculiar in the picture? What is your conclusion?

The power of HWE Violation of HWE is common in nature Non-HWE implies that some assumption is violated Figuring out the violated assumption leads to biological insight

Linkage Equilibrium HW equilibrium is the equilibrium in allele frequencies at a single locus. Linkage equilibrium refers to the equilibrium between allele occurrences at two loci. LE is due to recombination events First, we will consider the case where no recombination occurs. L.E. H.W.E

Perfect phylogeny and phylogeography

Before you start,… SNP matrix The infinite sites assumption Recombination Genealogy/phylogeny Basic data structures and algorithms Recombination and its lack in the y-chromosome and mtDNA

The y-chr (mtDNA) lineage is a tree Time

Reconstructing perfect phylogeny 1 3 4 2 8 5 6 7 12345678 01000000 00110110 00110100 00110000 10000000 10001000 10001001 We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later. In any tree, each node (except the root) has a single parent. It is sufficient to construct a parent for every node. In each step, we add a column and refine some of the nodes containing multiple children. Stop if all columns have been considered. We consider the directed (rooted) case. The root is all 0s, and all mutations are of the form 01

Columns 12345678 01000000 00110110 00110100 00110000 10000000 10001000 10001001 i A:0 B:1 C:1 D:1 E:0 F:0 G:0 Define i0: taxa (individuals) with a 0 at the i’th column Define i1: taxa (individuals) with a 1 at the i’th column CSE280 Vineet Bafna

Inclusion Property of Perfect Phylogeny For any pair of columns i,j, one of the following holds i1  j1 j1  i1 i1  j1 =  For any pair of columns i,j i < j if and only if i1  j1 Note that if i<j then the edge containing i is an ancestor of the edge containing j i For any pair of columns i,j, one of the folowing holds i1  j1 j1  i1 i1  j1 =  For any pair of columns i,j i < j if and only if i1  j1 Note that if i<j then the edge containing i is an ancestor of the edge containing j j CSE280 Vineet Bafna

Reconstruction of perfect phylogeny :12345678 A:01000000 B:00110110 C:00110100 D:00110000 E:10000000 F:10001000 G:10001001 Define i0: taxa (individuals) with a 0 at the i’th column Define i1: taxa (individuals) with a 1 at the i’th column CSE280 Vineet Bafna

Sort columns :12345678 A:01000000 B:00110110 C:00110100 D:00110000 E:10000000 F:10001000 G:10001001 A: B: C: D: E: F: G: 2 1 3 1 4 1 6 1 7 1 1 5 1 8 1 CSE280 Vineet Bafna

Add First column A: B: C: D: E: F: G: 2 1 3 4 6 7 5 8

Add First column A: B: C: D: E: F: G: 2 1 3 4 6 7 5 8

Columns 3,4,5 A: B: C: D: E: F: G: 2 1 3 4 6 7 5 8 A 2 B C D E F G

Columns A: B: C: D: E: F: G: 2 1 3 4 6 7 5 8 A 2 B C D E F G 6 3,4

A perfect phylogeny A: B: C: D: E: F: G: 2 1 3 4 6 7 5 8 A 2 B 6 C D 3 4 6 7 5 8 A B C D E F G 2 3,4 6 1 5 8

Perfect phylogeny-- unrooted case

A perfect phylogeny A: B: C: D: E: F: G: 2 1 3 4 6 7 5 8 A 2 B 6 C D 3 4 6 7 5 8 A B C D E F G 2 3,4 6 1 5 8

0s and 1s can be reversed A: B: C: D: E: F: G: 2 1 3 4 6 7 5 8 A 2 B 6 3 4 6 7 5 8 A B C D E F G 2 3,4 6 1 5 8

Unrooted case A: B: C: D: E: F: G: 2 1 3 1 4 1 6 1 7 1 1 5 1 8 1 4 1 3 1 4 1 6 1 7 1 1 5 1 8 1 4 1 Switch the values in each column, so that 0 is the majority element. Apply the algorithm for the rooted case. Relabel columns and individuals to the original values. CSE280 Vineet Bafna

Unrooted perfect phylogeny We transform matrix M to a 0-major matrix M0. if M0 has a directed perfect phylogeny, M has a perfect phylogeny. If M has a perfect phylogeny, does M0 have a directed perfect phylogeny?

Unrooted case Theorem: If M has a perfect phylogeny, there exists a relabeling, and a perfect phylogeny s.t. Root is all 0s For any SNP (column), #1s ≤ #0s All edges are mutated 01 CSE280 Vineet Bafna

Finding a ‘center’ of an unrooted phylogeny Is it possible to find a node or an edge so that none of the children have more than n/2 nodes?

Proof Consider the perfect phylogeny of M. Find the center: Root at the center, and direct all mutations from 01 away from the root. QED If the theorem is correct, then simply relabeling all columns so that the majority element is 0 is sufficient. CSE280 Vineet Bafna

‘Homework’ Problems What if there is missing data? (An entry that can be 0 or 1)? What if recurrent mutations are allowed (infinite sites is violated)? CSE280 Vineet Bafna

Linkage Disequilibrium CSE280 Vineet Bafna

Quiz Recall that a SNP data-set is a ‘binary’ matrix. Rows are individual (chromosomes) Columns are alleles at a specific locus Suppose you have 2 SNP datasets of a contiguous genomic region but no other information One from an African population, and one from a European Population. Can you tell which is which? How long does the genomic region have to be? CSE280 Vineet Bafna

Linkage (Dis)-equilibrium (LD) Consider sites A &B Case 1: No recombination Each new individual chromosome chooses a parent from the existing ‘haplotype’ A B 0 1 0 0 1 0 1 0 CSE280 Vineet Bafna

Linkage (Dis)-equilibrium (LD) Consider sites A &B Case 2: diploidy and recombination Each new individual chooses a parent from the existing alleles A B 0 1 0 0 1 0 1 1 CSE280 Vineet Bafna

Linkage (Dis)-equilibrium (LD) Consider sites A &B Case 1: No recombination Each new individual chooses a parent from the existing ‘haplotype’ Pr[A,B=0,1] = 0.25 Linkage disequilibrium Case 2: Extensive recombination Each new individual simply chooses and allele from either site Pr[A,B=(0,1)]=0.125 Linkage equilibrium A B 0 1 0 0 1 0 CSE280 Vineet Bafna

LD In the absence of recombination, With extensive recombination Correlation between columns The joint probability Pr[A=a,B=b] is different from P(a)P(b) With extensive recombination Pr(a,b)=P(a)P(b) CSE280 Vineet Bafna

Measures of LD Consider two bi-allelic sites with alleles marked with 0 and 1 Define P00 = Pr[Allele 0 in locus 1, and 0 in locus 2] P0* = Pr[Allele 0 in locus 1] Linkage equilibrium if P00 = P0* P*0 The D-measure of LD D = (P00 - P0* P*0) = -(P01 - P0* P*1) = … CSE280 Vineet Bafna

Other measures of LD D’ is obtained by dividing D by the largest possible value Suppose D = (P00 - P0* P*0) >0. Then the maximum value of Dmax= min{P0* P*1, P1* P*0} If D<0, then maximum value is max{-P0* P*0, -P1* P*1} D’ = D/ Dmax 1 D -D Site 1 1 -D D Site 2 CSE280 Vineet Bafna

Other measures of LD D’ is obtained by dividing D by the largest possible value Ex: D’ = abs(P11- P1* P*1)/ Dmax  = D/(P1* P0* P*1 P*0)1/2 Let N be the number of individuals Show that 2N is the 2 statistic between the two sites 1 P00N P0*N Site 1 1 Site 2 CSE280 Vineet Bafna

Digression: The χ2 test 1 O1 O2 O3 O4 1 The statistic behaves like a χ2 distribution (sum of squares of normal variables). A p-value can be computed directly

Observed and expected 1 1 P00N P01N P0*P*0N P0*P*1N Site 1 P10N P11N 1 1 P00N P01N P0*P*0N P0*P*1N Site 1 1 P10N P11N P1*P*0N P1*P*1N  = D/(P1* P0* P*1 P*0)1/2 Verify that 2N is the 2 statistic between the two sites

LD over time and distance The number of recombination events between two sites, can be assumed to be Poisson distributed. Let r denote the recombination rate between two adjacent sites r = # crossovers per bp per generation The recombination rate between two sites l apart is rl

LD over time Decay in LD Let D(t) = LD at time t between two sites r’=lr P(t)00 = (1-r’) P(t-1)00 + r’ P(t-1)0* P(t-1)*0 D(t) = P(t)00 - P(t)0* P(t)*0 = P(t)00 - P(t-1)0* P(t-1)*0 (Why?) D(t) =(1-r’) D(t-1) =(1-r’)t D(0) CSE280 Vineet Bafna

LD over distance Assumption Recombination rate increases linearly with distance and time LD decays exponentially. The assumption is reasonable, but recombination rates vary from region to region, adding to complexity This simple fact is the basis of disease association mapping. CSE280 Vineet Bafna

LD and disease mapping Consider a mutation that is causal for a disease. The goal of disease gene mapping is to discover which gene (locus) carries the mutation. Consider every polymorphism, and check: There might be too many polymorphisms Multiple mutations (even at a single locus) that lead to the same disease Instead, consider a dense sample of polymorphisms that span the genome CSE280 Vineet Bafna

LD can be used to map disease genes 1 LD decays with distance from the disease allele. By plotting LD, one can short list the region containing the disease gene. CSE280 Vineet Bafna

269 individuals ~1M SNPs 90 Yorubans 90 Europeans (CEPH) 44 Japanese 45 Chinese ~1M SNPs CSE280 Vineet Bafna

Haplotype blocks It was found that recombination rates vary across the genome How can the recombination rate be measured? In regions with low recombination, you expect to see long haplotypes that are conserved. Why? Typically, haplotype blocks do not span recombination hot-spots CSE280 Vineet Bafna

19q13 CSE280 Vineet Bafna

Long haplotypes Chr 2 region with high r2 value (implies little/no recombination) History/Genealogy can be explained by a tree ( a perfect phylogeny) Large haplotypes with high frequency CSE280 Vineet Bafna

LD variation across populations Reich et al. Nature 411, 199-204(10 May 2001) LD is maintained upto 60kb in swedish population, 6kb in Yoruban population CSE280 Vineet Bafna

Population specific recombination D’ was used as the measure between SNP pairs. SNP pairs were classified in one of the following Strong LD Strong evidence for recombination Others (13% of cases) Plot shows fraction of pairs with strong recombination (low LD) This roughly favors out-of-africa. A Coalescent simulation can help give confidence values on this. Gabriel et al., Science 2002 CSE280 Vineet Bafna

Understanding Population Data We described various population genetic concepts (HW, LD), and their applicability The values of these parameters depend critically upon the population assumptions. What if we do not have infinite populations No random mating (Ex: geographic isolation) Sudden growth Bottlenecks Ad-mixture It would be nice to have a simulation of such a population to test various ideas. How would you do this simulation? CSE280 Vineet Bafna