Presentation is loading. Please wait.

Presentation is loading. Please wait.

Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations.

Similar presentations


Presentation on theme: "Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations."— Presentation transcript:

1 Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations

2 Ancestry models No Admixture: each individual is derived completely from a single subpopulation Admixture: individuals may have mixed ancestry: some fraction qk of the genome of individual i is derived from subpopulation k.

3 Typical data: Individual Locus Locus Locus Locus 4 A,A A,A A,C A,A A,B A,A A,B A,A B,B A,B A,A A,A C,C D,E D,E B,C C,C C,D D,D B,D B,C E,E A,E C,E A,C D,D C,D A,D {A,B,C,D,E} are labels for the different alleles at each locus.

4 More on the model... Let P1, P2, …, PK represent the (unknown) allele frequencies in each subpopulation Let Z1, Z2, … , Zm represent the (unknown) subpopulation of origin of the sampled individuals (no admixture model). Let Xijk be the genotype data for allele copy k of individual i at locus j. Assuming Hardy-Weinberg and linkage equilibrium within subpopulations, the likelihood of an individual’s genotype in subpopulation k, Gi is given by the product of the relevant allele frequencies: Pr(Gi | Zi= k, Pk) = P p(k) p(k) Xij1 Xij2 loci j

5 S Pr(Gi | Pj, Zi= j) Pr(Zi= j)
Then--adopting a Bayesian framework--we can write down the probability that individual i is from subpopulation k: Pr(Gi | Pk, Zi= k) Pr(Zi= k) S Pr(Gi | Pj, Zi= j) Pr(Zi= j) pops Here, Pr(Zi= k) gives the prior probability that individual i is from subpopulation k. Assigning the individual at random to a population according to the probabilities is an example of Gibbs sampling.

6 Similarly, a natural estimate of the allele frequencies in subpopulation k is:
Frequency of allele j at locus l in pop k = # copies of allele j in individuals from k 2*(# individuals from subpopulation k) But because we are Bayesian, and doing MCMC, we sample from a posterior distribution for the frequency that also depends on the prior.

7 MCMC algorithm (for fixed K)
Start at random initial values Z(0) for the population assignments. Then iterate the following steps for n=1,2,…. Step 1: Sample P(n) from Pr(P|Z(n-1) ,G) Step 2: Sample Z(n) from Pr(Z|P(n) ,G) For large n, P and Z will converge to the appropriate joint posterior distribution. Estimation of K performed separately (approximately)

8 Example: Taita Thrush data
three main sampling locations in Kenya low migration rates (radio-tagging study) 155 individuals, genotyped at 7 microsatellite loci

9 Neighbor-joining tree of data

10

11

12

13

14

15 Since 2000 Model more features of heredity
Infer more complex histories Faster algorithms Better data summaries More data


Download ppt "Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations."

Similar presentations


Ads by Google