Presentation is loading. Please wait.

Presentation is loading. Please wait.

March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner www.cse.ucsd.edu/classes/sp05/cse291.

Similar presentations


Presentation on theme: "March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner www.cse.ucsd.edu/classes/sp05/cse291."— Presentation transcript:

1 March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner www.cse.ucsd.edu/classes/sp05/cse291

2 March 2006Vineet Bafna Review Hardy Weinberg Equilibrium Linkage Equlibrium

3 March 2006Vineet Bafna Recombination and Linkage Equilibrium In a freely mixing population, an individual chromosome randomly chooses its parent from the available pool With unimpeded recombination (Linkage Equilibrium), the individual freely chooses its two parent chromsomes, and then freely chooses alleles from the parents What is the probability of seeing the allele ? 0 0 0 0 0 0 0 1 1 0 1 1 1 1 11

4 March 2006Vineet Bafna Measures of LD Consider two bi-allelic sites with alleles marked with 0 and 1 Define – P 00 = Pr[Allele 0 in locus 1, and 0 in locus 2] – P 0* = Pr[Allele 0 in locus 1] Linkage equilibrium if P 00 = P 0* P *0 D = abs(P 00 - P 0* P *0 ) = abs(P 01 - P 0* P *1 ) = …

5 March 2006Vineet Bafna LD over time With random mating, and fixed recombination rate r between the sites, Linkage Disequilibrium will disappear over time – Let D (t) = LD at time t – P (t) 00 = (1-r) P (t-1) 00 + r P (t-1) 0* P (t-1) *0 – D (t) = P (t) 00 - P (t) 0* P (t) *0 = P (t) 00 - P (t-1) 0* P (t-1) *0 – D (t) =(1-r) D (t-1) =(1-r) t D (0) (HW)

6 March 2006Vineet Bafna LD over distance Assumption – Recombination rate increases linearly with distance Let r be the (constant) recombination rate per bp per generation The assumption is reasonable, but recombination rates vary from region to region, adding to complexity D (t) = P (t) 00 - P 0* P *0 = (1-r) d P (t-1) 00 + [1- (1-r) d ] P 0* P *0 - P 0* P *0 = (1-r) d D (t-1)

7 March 2006Vineet Bafna LD and disease mapping Consider a mutation that is causal for a disease. The goal of disease gene mapping is to discover which gene (locus) carries the mutation. Consider every polymorphism, and check: – There might be too many polymorphisms – Multiple mutations (even at a single locus) that lead to the same disease Instead, consider a dense sample of polymorphisms that span the genome

8 March 2006Vineet Bafna LD can be used to map disease genes LD decays with distance from the disease allele. By plotting LD, one can short list the region containing the disease gene. 011001011001 DNNDDNDNNDDN LD

9 March 2006Vineet Bafna LD and disease gene mapping problems Marker density? Complex diseases Population sub-structure

10 March 2006Vineet Bafna Population Genetics Often we look at these equilibria (Linkage/HW) and their deviations in specific populations These deviations offer insight into evolution. However, what is Normal? A combination of empirical (simulation) and theoretical insight helps distinguish between expected and unexpected.

11 March 2006Vineet Bafna Topic 2: Simulating population data We described various population genetic concepts (HW, LD), and their applicability The values of these parameters depend critically upon the population assumptions. – What if we do not have infinite populations – No random mating (Ex: geographic isolation) – Sudden growth – Bottlenecks – Ad-mixture It would be nice to have a simulation of such a population to test various ideas. How would you do this simulation?

12 March 2006Vineet Bafna Wright Fisher Model of Evolution Fixed population size from generation to generation Random mating

13 March 2006Vineet Bafna Coalescent model Insight 1: – Separate the genealogy from allelic states (mutations) – First generate the genealogy (who begat whom) – Assign an allelic state (0) to the ancestor. Drop mutations on the branches.

14 March 2006Vineet Bafna Coalescent model Insight 1: – Assign an allelic state (0) to the ancestor. Drop mutations on the branches. – The mutations are proportional to the branch length. – Each site (locus) mutates at most once. – At the end, drop any fixed sites How efficient is this? How many generations do we need to simulate today’s population? 0110101101 0011100111 0110101101 0000100001 0000000000 Loci Individuals Loci

15 March 2006Vineet Bafna Coalescent theory Insight 2: – Much of the genealogy is irrelevant, because it disappears. – Better to go backwards

16 March 2006Vineet Bafna Coalescent Note that in the Wright Fischer model, the population is freely mixing, and constant size N One way to think about it is that each individual in the current generation selects a parent uniformly at random from the N individuals in the previous generation. When two individuals choose the same parent, they coalesce. Once they coalesce, they stay together. We continue until only one individual is left. Note that this only gives a random topology with labeled leaves. The only thing of interest is branch length (number of generations to MRCA) 1 432

17 March 2006Vineet Bafna Coalescent theory (Kingman) Input – (Fixed population (N individuals), random mating) Consider 2 individuals. – Probability that they coalesce in the previous generation (have the same parent)= Probability that they do not coalesce after t generations=

18 March 2006Vineet Bafna Coalescent theory Consider k individuals. – Probability that no pair coalesces after 1 generation – Probability that no pair coalesces after t generations  is time in units of N generations

19 March 2006Vineet Bafna Coalescent approximation Insight 3: – Topology is independent of coalescent times – If you have n individuals, generate a random binary topology Iterate (until one individual) – Pick a pair at random, and coalesce Insight 4: – To generate coalescent times, there is no need to go back generation by generation. Generate n random variables to get the n coalescence times.

20 March 2006Vineet Bafna Coalescent approximation At any step, there are 1 <= k <= n individuals To generate time to coalesce (k to k-1 individuals) – Pick a number from exponential distribution with rate k(k-1)/2 – Mean time to coalescence (in units of N generations) = 2/(k(k-1))

21 March 2006Vineet Bafna Mean time to coalesce If there are k individuals, the Probability for a coalescence in one generation is – k(k-1)/2N Expected time to coalesce = 2N/k(k-1)

22 March 2006Vineet Bafna Typical coalescents 4 random examples with n=6 (Note that we do not need to specify N. Why?) Expected time to coalesce to 1 node?

23 March 2006Vineet Bafna Coalescent properties Expected time for the last step The last step is half of the total time to coalesce Studying larger number of individuals does not change numbers tremendously EX: Number of mutations in a population is proportional to the total branch length of the tree – E(T tot ) =1

24 March 2006Vineet Bafna Variants (exponentially growing populations) If the population is growing exponentially, the branch lengths become similar, or even star-like. Why? With appropriate scaling of time, the same process can be extended to various scenarios: male- female, hermaphrodite, segregation, migration, etc.

25 March 2006Vineet Bafna Simulating population data Generate a coalescent (Topology + Branch lengths) For each branch length, drop mutations with rate  Generate sequence data Note that the resulting sequence is a perfect phylogeny. Given such sequence data, can you reconstruct the coalescent tree? (Only the topology, not the branch lengths) Also, note that all pairs of positions are correlated (should have high LD).

26 March 2006Vineet Bafna Coalescent with Recombination An individual may have one parent, or 2 parents

27 March 2006Vineet Bafna ARG: Coalescent with recombination Given: mutation rate , recombination rate , population size 2N (diploid), sample size n. How can you generate the ARG (topology+branch lengths) efficiently? How will you generate sequences for n individuals? Given sequence data, can you reconstruct the ARG (topology)

28 March 2006Vineet Bafna Recombination Define r as the probability of recombining per generation. – Note that the parameter is a value which will be defined later Assume k individuals in a generation. The following might happen: 1. An individual arises because of a recombination event between two individuals (It will have 2 parents). 2. Two individuals coalesce. 3. Neither (Each individual has a distinct parent). 4. Multiple events (low probability).

29 March 2006Vineet Bafna Recombination We ignore the case of multiple (> 1) events in one generation Pr (No recombination) = 1-kr Pr (No coalescence) Consider scaled time in units of 2N generations. Thus the number of individuals increase with rate kr2N, and decrease with rate The value 2rN is usually small, and therefore, the process will ultimately coalesce to a single individual (MRCA)

30 March 2006Vineet Bafna Let k = n, Define Iterate until k= 1 – Choose time from an exponential distribution with rate – Pick event as recombination with probability – If event is recombination, choose an individual to recombine, and a position, else choose a pair to coalesce. – Update k, and continue ARG What is the flaw in this procedure?

31 March 2006Vineet Bafna Ancestral Recombination Graph

32 March 2006Vineet Bafna Simulating sequences on the ARG Generate topology and branch lengths as before For each recombination, generate a position. Next generate mutations at random on branch lengths – For a mutation, select a position as well.

33 March 2006Vineet Bafna Review: Coalescent theory applications Coalescent simulations allow us to test various hypothesis. The coalescent/ARG is usually not inferred, unlike in phylogenies.

34 March 2006Vineet Bafna Coalescent theory: example Ex: ~1400bp at Sod locus in Dros. – 10 taxa – 5 were identical. The other 5 had 55 mutations. – Q: Is this a chance event, or is there selection for this haplotype.

35 March 2006Vineet Bafna Coalescent application – 10000 coalescent simulations were performed on 10 taxa. – 55 mutations on the coalescent branches – Count the number of times 5 lineages are identical – The event happened in 1.1% of the cases. – Conclusion: selection, or some other mechanism explains this data.

36 March 2006Vineet Bafna Coalescent example: Out of Africa hypothesis Looking at lineage specific mutations might help discard the candelabra model. How? How do we decide between the multi-regional and Out-of-Africa model? How do we decide if the ancestor was African?

37 March 2006Vineet Bafna Human Samples We look at data from human samples Gabriel et al. Science 2002. – 3 populations were sampled at multiple regions spanning the genome 54 regions (Average size 250Kb) SNP density 1 over 2Kb 90 Individuals from Nigeria (Yoruban) 93 Europeans 42 Asian 50 African American

38 March 2006Vineet Bafna Population specific recombination D’ was used as the measure between SNP pairs. SNP pairs were classified in one of the following – Strong LD – Strong evidence for recombination – Others (13% of cases) This roughly favors out-of- africa. A Coalescent simulation can help give confidence values on this. Gabriel et al., Science 2002

39 March 2006Vineet Bafna Recombination events and  Given , n, can you compute the expected number of recombination events? It can be shown that E(n,  ) =  log (n) Questions that people are interested in Given a set of sequences from a population, compute the recombination rate  Given a population reconstruct the most likely history (as an ancestral recombination graph)

40 March 2006Vineet Bafna Re-constructing history without the coalescent

41 March 2006Vineet Bafna An algorithm for constructing a perfect phylogeny We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later. In any tree, each node (except the root) has a single parent. – It is sufficient to construct a parent for every node. In each step, we add a column and refine some of the nodes containing multiple children. Stop if all columns have been considered.

42 March 2006Vineet Bafna Inclusion Property For any pair of columns i,j – i < j if and only if i 1  j 1 Note that if i<j then the edge containing i is an ancestor of the edge containing i i j

43 March 2006Vineet Bafna Example 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 r A BCDE Initially, there is a single clade r, and each node has r as its parent

44 March 2006Vineet Bafna Sort columns Sort columns according to the inclusion property (note that the columns are already sorted here). This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0

45 March 2006Vineet Bafna Add first column In adding column i – Check each edge and decide which side you belong. – Finally add a node if you can resolve a clade r A B C D E 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 u

46 March 2006Vineet Bafna Adding other columns Add other columns on edges using the ordering property r E B C D A 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 1 2 4 3 5

47 March 2006Vineet Bafna Unrooted case Switch the values in each column, so that 0 is the majority element. Apply the algorithm for the rooted case

48 March 2006Vineet Bafna

49 March 2006Vineet Bafna


Download ppt "March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner www.cse.ucsd.edu/classes/sp05/cse291."

Similar presentations


Ads by Google